3台主机上的配置均相同,Hadoop在启动时会根据配置文件判定当前节点的角色,并启动其相应的服务。因此以下的修改在每个节点上都要进行。
1. 修改/etc/hadoop/core-site.xml,内容如下
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="https://www.linuxidc.com/configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master.flyence.tk:8020</value>
<final>true</final>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implimentation.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/temp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
注:这里修改Hadoop的临时目录,则在3个节点上都需新建/hadoop文件夹,并赋予hadoop用户rwx权限,可用setfacl语句。
2. 修改/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="https://www.linuxidc.com/configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master.flyence.tk:8021</value>
<final>true</final>
<description>The host and port that the MapReduce JobTracker runs at. </description>
</property>
<property>
<name>mapred.child.Java.opts</name>
<value>-Xmx512m</value>
</property>
</configuration>
3. 修改/etc/hadoop/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="https://www.linuxidc.com/configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>The actual number of replications can be specified when the file is created.</description></property>
</configuration>
4. 修改/etc/hadoop/masters
snn.flyence.tk
5. 修改/etc/hadoop/slaves
datanode.flyence.tk
6. 在master上初始化数据节点
[root@master ~]# hadoop namenode -format
五. 启动Hadoop
1. 首先为执行脚本增加执行权限
[root@master ~]# chmod +x /usr/sbin/start-all.sh
[root@master ~]# chmod +x /usr/sbin/start-dfs.sh
[root@master ~]# chmod +x /usr/sbin/start-mapred.sh
[root@master ~]# chmod +x /usr/sbin/slaves.sh
2. 启动Hadoop
[hadoop@master ~]$ start-all.sh
starting namenode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-namenode-master.flyence.tk.out
datanode.flyence.tk: starting datanode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-datanode-snn.flyence.tk.out
snn.flyence.tk: starting secondarynamenode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-secondarynamenode-datanode.flyence.tk.out
starting jobtracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-jobtracker-master.flyence.tk.out
datanode.flyence.tk: starting tasktracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-tasktracker-snn.flyence.tk.out
注意:很奇怪,用rpm格式的hadoop包,安装后的执行脚本竟然是在/usr/sbin下的,而且没有执行权限。而在启动Hadoop时,一般都是用的非系统用户,不明白为什么,用源码包安装的时候,不会有这个问题。
如果要停止Hadoop的各进程,则使用stop-all.sh脚本即可,当然也要赋予执行权限。