Hadoop集群完全分布式模式环境部署(2)

日期：2020-07-19 栏目：程序人生浏览：次

在Hadoop启动以后，Namenode是通过SSH（Secure Shell）来启动和停止各个datanode上的各种守护进程的，这就须要在节点之间执行指令的时候是不须要输入密码的形式，故我们须要配置SSH运用无密码公钥认证的形式。以本文中的四台机器为例，现在Master是主节点，他须要连接Slave1、Slave2和Slave3。须要确定每台机器上都安装了ssh，并且datanode机器上sshd服务已经启动。
切换到hadoop用户( 保证用户hadoop可以无需密码登录，因为我们后面安装的hadoop属主是hadoop用户。)
1) 在每台主机生成密钥对

#su - hadoop
#ssh-keygen -t rsa#cat ~/.ssh/id_rsa.pub
这个命令生成一个密钥对：id_rsa（私钥文件）和id_rsa.pub（公钥文件）。默认被保存在~/.ssh/目录下。
2) 将Master公钥添加到远程主机Slave1的 authorized_keys 文件中
在/home/hadoop/.ssh/下创建authorized_keys

#vim authorized_keys
将刚才复制的公钥复制进去
权限设置为600.(这点很重要，网没有设置600权限会导致登陆失败)
测试登陆：

$ ssh Slave1
The authenticity of host 'slave2 (192.168.137.101)' can't be established.
RSA key fingerprint is d5:18:cb:5f:92:66:74:c7:30:30:bb:36:bf:4c:ed:e9.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave2,192.168.137.101' (RSA) to the list of known hosts.
Last login: Fri Aug 30 21:31:36 2013 from slave1
[hadoop@Slave1 ~]$
同样的方法，将Master 的公钥复制到其他节点。

3、安装Hadoop

1) 切换为hadoop用户，下载安装包后，直接解压安装即可：

#su - hadoop
#wget
#tar -zxvf hadoop-1.2.1.tar.gz
我的安装目录为：
/home/hadoop/hadoop-1.2.1
为了方便，使用hadoop命令或者start-all.sh等命令，修改Master上/etc/profile 新增以下内容：

export HADOOP_HOME=/home/hadoop/hadoop-1.2.1
export PATH=$PATH:$HADOOP_HOME/bin
修改完毕后，执行source /etc/profile 来使其生效。
2) 配置conf/hadoop-env.sh文件
配置conf/hadoop-env.sh文件,添加：

export Java_HOME=/usr/local/jdk1.6.0_45/
这里修改为你的jdk的安装位置。
测试hadoop安装：

/home/hadoop/hadoop-1.2.1/bin/hadoop jar hadoop-0.20.2-examples.jarwordcount conf/ /tmp/out
4、集群配置（所有节点相同）

1) 配置文件：conf/core-site.xml

fs.default.name
Master:9000
The name of the default file system. Either the literal string "local" or a host:port forDFS.

hadoop.tmp.dir
/home/hadoop/tmp
A base for other temporary directories.
fs.default.name是NameNode的URI。hdfs://主机名:端口/hadoop.tmp.dir ：Hadoop的默认临时路径，这个最好配置，如果在新增节点或者其他情况下莫名其妙的DataNode启动不了，就删除此文件中的tmp目录即可。不过如果删除了NameNode机器的此目录，那么就需要重新执行NameNode格式化的命令。

2) 配置文件：conf/mapred-site.xml

mapred.job.tracker
Master:9001
The host and port that the MapReduce job tracker runs at. If "local", then jobs are runin-process as a single map and reduce task.

mapred.local.dir
/home/hadoop/tmp
mapred.job.tracker是JobTracker的主机（或者IP）和端口。主机:端口。

3) 配置文件：conf/hdfs-site.xml

dfs.name.dir
/home/hadoop/name1, /home/hadoop/name2
Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.

dfs.data.dir
/home/hadoop/data1, /home/hadoop/data2
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.

dfs.replication
3
Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
dfs.name.dir是NameNode持久存储名字空间及事务日志的本地文件系统路径。当这个值是一个逗号分割的目录列表时，nametable数据将会被复制到所有目录中做冗余备份。
dfs.data.dir是DataNode存放块数据的本地文件系统路径，逗号分割的列表。当这个值是逗号分割的目录列表时，数据将被存储在所有目录下，通常分布在不同设备上。
dfs.replication是数据需要备份的数量，默认是3，如果此数大于集群的机器数会出错。
注意：此处的name1、name2、data1、data2目录不能预先创建，hadoop格式化时会自动创建，如果预先创建反而会有问题。

4) 配置masters和slaves主从结点
配置conf/masters和conf/slaves来设置主从结点，注意最好使用主机名，并且保证机器之间通过主机名可以互相访问，每个主机名一行。

$vim masters：
输入：
Master
$vim slaves：
输入：
Slave1
Slave2
Slave3
配置结束，把配置好的hadoop文件夹拷贝到其他集群的机器中，并且保证上面的配置对于其他机器而言正确，例如：如果其他机器的Java安装路径不一样，要修改conf/hadoop-env.sh

$scp -r /home/hadoop/hadoop-1.2.1 Slave1:/home/hadoop/
$scp -r /home/hadoop/hadoop-1.2.1 Slave2:/home/hadoop/
$scp -r /home/hadoop/hadoop-1.2.1 Slave3:/home/hadoop/

转载注明出处：https://www.heiqu.com/7d442a3edc22a2aa39babefd0183aa37.html

Hadoop集群完全分布式模式环境部署(2)

相关推荐