单节点伪分布式Hadoop的安装与配置
安装需要两个先决条件,java版本1.6以上和ssh登录本机不需要输入密码,不同版本的hadoop安装方法不同,注意要通过apache网站的匹配安装说明进行安装
1,安装java
Install java
rpm -ivh jdk-7u7-linux-x64.rpm
[root@linux2 ~]# rm /usr/bin/java
rm: remove symbolic link `/usr/bin/java'? y
[root@linux2 ~]# ln -s /usr/java/jdk1.7.0_07/bin/java /usr/bin/java
[root@linux2 ~]# java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
2,配置ssh互通
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
如果不是root用户,以上方法可能无法成功,需要修改.ssh目录的权限
首先是给.ssh目录赋一个权限
1 chmod 700 /home/hadoop/.ssh
2 chmod 600 authorized_keys
3 chmod 600 id_rsa
具体权限应该如下
-rw------- 1 hadoop hadoop 396 05-16 05:10 authorized_keys
-rw------- 1 hadoop hadoop 1675 05-16 05:10 id_rsa
-rwxrwxrwx 1 hadoop hadoop 396 05-16 05:10 id_rsa.pub
-rwxrwxrwx 1 hadoop hadoop 402 05-16 05:10 known_hosts
.ssh目录权限:
drwx------ 2 hadoop hadoop 4096 05-16 05:10 .ssh
3,安装standlone hadoop,即只有一个进程的模式
网上下载hadoop版本,解压
tar -zxvf hadoop-1.0.4.tar.gz
设置JAVA_HOME
vi conf/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_07
测试安装是否成功
./bin/hadoop jar hadoop-examples-1.0.4.jar grep input output '[a-z.]+'
cat output/*
./bin/hadoop jar hadoop-examples-1.0.4.jar wordcount input output
4,安装伪分布式架构hadoop
1) 解压缩文件,同上,配置ssh无密码登录
2)修改配置文件
vi conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
vi conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
vi conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
3) 格式化分布式文件系统
bin/hadoop namenode -format
4)启动haddop
bin/start-all.sh
备注:主要启动后要确保如下两个网页正常浏览,且active node为1(如果多次安装注意清理tmp目录下的相关文件)
也可使用浏览器浏览下列网址进行验证
NameNode - :50070/
JobTracker - :50030/
5)生成目录
hadoop fs -mkdir test
hadoop fs -ls test
hadoop fs -put conf test
6)执行
hadoop jar hadoop-examples-1.0.4.jar grep test/conf output 'dfs[a-z.]+'
hadoop fs -ls output
[hadoop@linux1 hadoop-1.0.4]$ hadoop fs -cat /user/hadoop/output/part-00000 | head -13
1 dfs.replication
1 dfs.server.namenode.
1 dfsadmin
实现完成后后者实验不成功,可以删除test
# hadoop fs -rmr test
Deleted hdfs://localhost/user/root/test