hdfs-site.xml
<configuration> <!--命名空间设置ns1--> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!--namenodes节点ID:nn1,nn2(配置在命名空间mycluster下)--> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <!--nn1,nn2节点地址配置--> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>ch01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>ch02:8020</value> </property> <!--nn1,nn2节点WEB地址配置--> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>ch01:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>ch02:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://ch01:8485;ch02:8485;ch03:8485/mycluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>file:/opt/hadoop/hadoop-2.6.0-cdh5.6.0/tmp/dfs/journalnode</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop/hadoop-2.6.0-cdh5.6.0/tmp/dfs/name</value> </property> <property> <name>dfs.namenode.data.dir</name> <value>file:/opt/hadoop/hadoop-2.6.0-cdh5.6.0/tmp/dfs/data</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <!--启用自动故障转移--> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.replication.max</name> <value>32767</value> </property> </configuration>mapred-site.xml
<configuration> <property> <name>mapreduce.framwork.name</name> <value>yarn</value> </property> </configuration>yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>ch01</value> </property> </configuration>slaves
ch01 ch02 ch03 使用Zookeeper配置Zookeeper环境变量
[root@ch01 ~]#vi /etc/profile #ZOOKEEPER ZOOKEEPER_HOME=http://www.likecs.com/opt/hadoop/zookeeper-3.4.5-cdh5.6.0 //安装目录 PATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/sbin export ZOOKEEPER_HOME PATH启动Zookeeper
1)在ch01,ch02,ch03所有机器上执行,下面的代码是在ch01上执行的示例:
root@ch01:zkServer.sh start JMX enabled by default Using config: /opt/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg Starting zookeeper ... STARTED root@ch01:/home/hadoop# /opt/hadoop/zookeeper-3.4.5/bin/zkServer.sh status JMX enabled by default Using config: /opt/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg Mode: follower2)在每台机器上执行下面的命令,可以查看状态,在ch01上是leader,其他机器是follower
3)测试zookeeper是否启动成功,看下面第29行高亮处,表示成功。
zkCli.sh
4)在ch01上格式化zookeeper,第33行的日志表示创建成功。
hdfs zkfc -formatZK
5)验证zkfc是否格式化成功,如果多了一个hadoop-ha包就是成功了。
zkCli.sh
1)依次在ch01,ch02,ch03上面执行
hadoop-daemon.sh start journalnode
2)格式化集群的一个NameNode(ch01),有两种方法,我使用的是第一种
hdfs namenode –format
3)在ch01上启动刚才格式化的 namenode
hadoop-daemon.sh start namenode