Ubuntu 14.04 (32位)上搭建Hadoop 2.5.1单机和伪分布式环(3)

日期：2020-06-20 栏目：程序人生浏览：次

5、伪分布式环境搭建

5.1.设定*-site.xml
这里需要设定4个文件：core-site.xml,hdfs-site.xml,mapred-site.xml和yarn-site.xml.都在/opt/Hadoop-2.5.1/etc/hadoop/目录下
core-site.xml: Hadoop Core的配置项，例如HDFS和MapReduce常用的I/O设置等。
hdfs-site.xml: Hadoop 守护进程的配置项，包括namenode，辅助namenode和datanode等。
mapred-site.xml： MapReduce 守护进程的配置项，包括jobtracker和tasktracker。

yarn-site.xml： Yarn 框架用于执行MapReduce 处理程序

首先在hadoop目录下新建几个文件夹

~$ mkdir tmp
~$ mkdir -p hdfs/name
~$ mkdir -p hdfs/data

接下来编辑那4个文件(IP地址处我填了我的局域网IP：192.168.1.135，根据需要填写自己主机的IP或者直接用localhost)：

core-site.xml:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.135:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-2.5.1/tmp</value>
</property>
</configuration>

hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/uopt/hadoop-2.5.1/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop-2.5.1/hdfs/data</value>
</property>
</configuration>

上述路径都需要自己手动用mkdir创建(开头就已经创建了)，具体位置也可以自己选择，其中dfs.replication的值建议配置为与分布式 cluster 中实际的 DataNode 主机数一致，在这里由于是伪分布式环境所以设置其为1。

mapred-site.xml（这个文件默认不存在，但是存在其模板文件mapred-site.xml.template，copy一份就行了）:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>Yarn</value>
</property>
</configuration>

请注意这里安装的2.5.1版本，2.*版本较1.*版本改动很大，主要是用Hadoop MapReduceV2(Yarn) 框架代替了一代的架构，其中JobTracker 和 TaskTracker 不见了，取而代之的是 ResourceManager, ApplicationMaster 与 NodeManager 三个部分，而具体的配置文件位置与内容也都有了相应变化。所以我们在mapred-site.xml中设置了处理map-reduce的框架Yarn，接下来就需要在yarn-site.xml中配置ResourceManager, ApplicationMaster 与 NodeManager。

yarn-site.xml：

<configuration>

<property>
<name>Yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>Yarn.resourcemanager.address</name>
<value>192.168.1.135:18040</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>Yarn.resourcemanager.scheduler.address</name>
<value>192.168.1.135:18030</value>
</property>
<property>
<description>The address of the RM web application.</description>
<name>Yarn.resourcemanager.webapp.address</name>
<value>192.168.1.135:18088</value>
</property>
<property>
<description>The address of the resource tracker interface.</description>
<name>Yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.1.135:8025</value>
</property>
</configuration>