Ubuntu 14.04 LTS下安装Hadoop 1.2.1(集群分布式模式)
安装步骤:
1) JDK -- Hadoop是用Java写的,不安装Java虚拟机怎么运行Hadoop的程序;
2)创建专门用于运行和执行hadoop任务(比如map和reduce任务)的linux用户,就像windows下的服务账户,并且授权给他可以访问运行JDK目录权限,让他可以执行java虚拟机。这个账户最后用来运行bin\.start_all.sh启动hadoop的所有服务,那运行账户肯定是他了,他肯定要有足够的权限。再者,需要配置这个账户的个人环境变量,把Java虚拟机的主目录地址环境变量之一,不然后面运行hadoop任务肯定报错,根本在当前的环境变量下找不到要运行的java命令。
2)修改/etc/hosts文件和/etc/hostname -- 集群模式下机器需要互相通信,靠的是IP地址。而通常我们利用的是主机名称,然后通过IP地址和主机名称的映射来和目标主机通信。那么我们自然就得改/etc/hosts文件和/etc/hostname,前者是主机和IP地址的映射关系,后者是主机的本地主机名;
4)SSH -- 集群机器间互相访问各自的资源需要建立连接通讯,使用的是SSH协议进行安全通道的数据交换,再者也需要利用SSH的授权证书办法来实现免密码登陆目标主机;
4)然后就是安装hadoop了。需要修改几个配置文件。hadoop的配置文件有很多种。
只读型配置文件:src/core/core-default.xml, src/hdfs/hdfs-default.xml, src/mapred/mapred-default.xml, conf/mapred-queues.xml。
定位设置:conf/core-site.xml, conf/hdfs-site.xml, conf/mapred-site.xml, conf/mapred-queues.xml。这种文件一般用于配置hadoop一些核心功能,比如hdfs和mapred的目录信息。
环境配置:conf/Hadoop-env.sh
说回正题,既然hadoop的核心是hdfs和mapreduce,那你至少要配置hdfs的namenode目录位置,datanode的目录位置,mapreduce的job tracker和task tracker通信的端口号,系统目录和本地目录等等。
这些在master和slave机子上都是相同的。
5)配置完后就是格式化hdfs了。在master机上格式化hdfs系统。
6)格式化完hdfs系统就启动所有的hdfs进程。
搭建环境用的是VMWare Workstation 12。这里用了三台Linux虚拟机:master,slave1,slave2。
具体配置:
master slave1 slave2OS Ubuntu 14.04 LTE x64 Ubuntu 14.04 LTE x64 Ubuntu 14.04 LTE x64
memory 1GB 1GB 1GB
hard drive space 20GB 20GB 20GB
processors 2 2 2
IP Address 192.168.2.110 192.168.2.111 192.168.2.112
Roles NameNode
DataNode
JobTracker
TaskTracker
SecondaryNameNode DataNode
TaskTracker DataNode
TaskTracker
hadoop directory /opt/hadoop /opt/hadoop /opt/hadoop
jdk version JDK 1.8 JDK 1.8 JDK 1.8
先在master机子上安装JDK和SSH
1. JDK安装
linuxidc@ubuntu:/run/network$ scp linuxidc@192.168.2.100:/home/linuxidc/Download/jdk-8u65-linux-x64.tar.gz ~ The authenticity of host '192.168.2.100 (192.168.2.100)' can't be established. ECDSA key fingerprint is da:b7:c3:2a:ea:a2:76:4c:c3:c1:68:ca:0e:c2:ea:92. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.2.100' (ECDSA) to the list of known hosts. linuxidc@192.168.2.100's password: scp: /home/linuxidc/Download/jdk-8u65-linux-x64.tar.gz: No such file or directory linuxidc@ubuntu:/run/network$ scp linuxidc@192.168.2.100:/home/linuxidc/Downloads/jdk-8u65-linux-x64.tar.gz ~ linuxidc@192.168.2.100's password: jdk-8u65-linux-x64.tar.gz 100% 173MB 21.6MB/s 00:08 linuxidc@ubuntu:/run/network$ cd ~ linuxidc@ubuntu:~$ ls Desktop Downloads jdk-8u65-linux-x64.tar.gz Pictures Templates Documents examples.desktop Music Public Videos linuxidc@ubuntu:~$ cd . linuxidc@ubuntu:~$ cd / linuxidc@ubuntu:/$ ls bin dev initrd.img lost+found opt run sys var boot etc lib media proc sbin tmp vmlinuz cdrom home lib64 mnt root srv usr linuxidc@ubuntu:/$ sudo mkdir jvm [sudo] password for linuxidc: jerry@ubuntu:/$ rm jvm/ rm: cannot remove ‘jvm/’: Is a directory linuxidc@ubuntu:/$ rm -d jvm/ rm: remove write-protected directory ‘jvm/’? y rm: cannot remove ‘jvm/’: Permission denied linuxidc@ubuntu:/$ ls bin dev initrd.img lib64 mnt root srv usr boot etc jvm lost+found opt run sys var cdrom home lib media proc sbin tmp vmlinuz linuxidc@ubuntu:/$ sudo mkdir /usr/lib/jvm linuxidc@ubuntu:/$ cd ~ linuxidc@ubuntu:~$ sudo tar zxf ./jdk-8u65-linux-x64.tar.gz -C /usr/lib/jvm/ linuxidc@ubuntu:~$ cd /usr/lib/jvm/ linuxidc@ubuntu:/usr/lib/jvm$ sudo mv jdk1.8.0_65 java linuxidc@ubuntu:/usr/lib/jvm$ cd java/ linuxidc@ubuntu:/usr/lib/jvm/java$ sudo vim ~/.bashrc linuxidc@ubuntu:/usr/lib/jvm/java$ tail -n 4 ~/.bashrc export JAVA_HOME=/usr/lib/jvm/java export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH linuxidc@ubuntu:/usr/lib/jvm/java$ java -version The program 'java' can be found in the following packages: * default-jre * gcj-4.8-jre-headless * openjdk-7-jre-headless * gcj-4.6-jre-headless * openjdk-6-jre-headless Try: sudo apt-get install <selected package> linuxidc@ubuntu:/usr/lib/jvm/java$ sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/java/bin/java 300 update-alternatives: using /usr/lib/jvm/java/bin/java to provide /usr/bin/java (java) in auto mode linuxidc@ubuntu:/usr/lib/jvm/java$ sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/java/bin/javac 300 update-alternatives: using /usr/lib/jvm/java/bin/javac to provide /usr/bin/javac (javac) in auto mode linuxidc@ubuntu:/usr/lib/jvm/java$ sudo update-alternatives --install /usr/bin/jar jar /usr/lib/jvm/java/bin/jar 300 update-alternatives: using /usr/lib/jvm/java/bin/jar to provide /usr/bin/jar (jar) in auto mode linuxidc@ubuntu:/usr/lib/jvm/java$ sudo update-alternatives --install /usr/bin/javah javah /usr/lib/jvm/java/bin/javah 300 update-alternatives: using /usr/lib/jvm/java/bin/javah to provide /usr/bin/javah (javah) in auto mode linuxidc@ubuntu:/usr/lib/jvm/java$ sudo update-alternatives --install /usr/bin/javap javap /usr/lib/jvm/java/bin/javap 300 update-alternatives: using /usr/lib/jvm/java/bin/javap to provide /usr/bin/javap (javap) in auto mode linuxidc@ubuntu:/usr/lib/jvm/java$ sudo update-alternatives --config java There is only one alternative in link group java (providing /usr/bin/java): /usr/lib/jvm/java/bin/java Nothing to configure. linuxidc@ubuntu:/usr/lib/jvm/java$ java -version java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) linuxidc@ubuntu:/usr/lib/jvm/java$
2. 添加hadoop用户和用户组