NameNode和DatNode是设计运行在商业电脑的软件框架。这些机器通常是运行着GNU/Linux操作系统。HDFS是用Java语言构建的;任何机器只要支持Java就可以运行NameNode或者DataNode。使用Java这种高可移植性的语言就意味着HDFS可以部署在大范围的机器上。部署通常是在专用的机器上只运行NameNode软件。集群中的其他每个机器运行着单个DaaNode实例。架构并不排除在同一台机器部署多个DataNode,但是这种情况比较少见。
集群中只存在一个NameNode实例极大地简化系统的架构。NameNode是HDFS元数据的仲裁者和储存库。这个系统用这样的方式保证了数据的流动不能避过NameNode。
The File System Namespace (文件系统命名空间)HDFS supports a traditional hierarchical file organization. A user or an application can create directories and store files inside these directories. The file system namespace hierarchy is similar to most other existing file systems; one can create and remove files, move a file from one directory to another, or rename a file. HDFS supports user quotas and access permissions. HDFS does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features.
The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This information is stored by the NameNode.
HDFS支持传统的层级文件结构。用户或应用可以创建文件目录和存储文件在这些目录下。文件系统的命名空间层级跟其他已经在存在的文件系统很相像;可以创建和删除文件,将文件从一个目录移动到另一个目录或者重命名。HDFS支持用户限制和访问权限。HDFS不支持硬关联或者软关联。然而,HDFS架构不排除实现这些特性。
NameNode维持文件系统的命名空间。文件系统的命名空间或者它的属性的任何改变都被NameNode记录着。应用可以指定HDFS维持多少个文件副本。文件的拷贝数目称为文件的复制因子。这个信息将会被NameNode记录。
Data Replication(数据副本)HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file.
All blocks in a file except the last block are the same size, while users can start a new block without filling out the last block to the configured block size after the support for variable length block was added to append and hsync.
An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once (except for appends and truncates) and have strictly one writer at any time.
The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode.
HDFS是被设计成在一个集群中跨机器可靠地存储大量文件。它将每个文件存储为一序列的块。文件的块被复制保证容错。每个文件块的大小和复制因子都是可配置的。
一个文件的所有的块除了最后一个都是同样大小的,同时用户在可以在一个支持可变长度的块被同步添加之后启动一个新的块而没有配置最后一个块的大小。
应用可以指定文件的副本数目。复制因子可以在文件创建时指定,在后面时间修改。HDFS中的文件一旦写入(除了添加和截断)就必须在任何时间严格遵守一个写入者。
NameNode控制着关于blocks复制的所有决定。它周期性地接收集群中DataNode发送的心跳和块报告。收到心跳意味着DataNode在正常地运行着。一个块报告包含着DataNode上所有块信息的集合。
Replica Placement: The First Baby Steps 副本选址:第一次小尝试