Hadoop uses Map-Reduce, open source project
builds on distributed file system
- modeled on GFS
sub-project of Apache-Lucene
single namesapce for entire cluster
- managed by single namenode
- hierarchal directories
- optimized for streaming reads of large files
Files are broken into large blocks:
- typically 64 MB or larger
- replicated to several datanodes, for reliability
- clients can find location of blocks
Client talks to both namenode and datanodes
- data is not sent through the namenode
Recent Comments