最近大数据比较火,所以也想学习一下,所以在虚拟机安装Ubuntu Server,然后安装Hadoop。
以下是安装步骤:
1. 安装Java如果是新机器,默认没有安装java,运行java –version命名,看是否可以查看Java版本,如果未安装Java,这运行以下命名:
# Update the source list
$ sudo apt-get update
# The OpenJDK project is the default version of Java
# that is provided from a supported Ubuntu repository.
$ sudo apt-get install default-jdk
$ java -version
2.设置Hadoop用户和组$sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
3.安装并配置SSH$ sudo apt-get install ssh
$ su hduser
$ ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
接下来运行ssh命令,测试一下是否成功.
$ ssh localhost
4.安装Hadoop首先需要下载并解压Hadoop文件,运行命令:
$wget
这里的URL是最新的Hadoop2.6.0版,安装的时候可以先到官方网站看看需要下载哪个版本,然后更换这个Url.
下载完毕后,就是解压缩:
$ tar xvzf hadoop-2.6.0.tar.gz
然后将Hadoop文件夹搬到新文件夹,并且给hduser这个用户权限:
$ sudo mv hadoop-2.6.0 /usr/local/hadoop
$ cd /usr/local
$ sudo chown -R hduser:hadoop hadoop
5.配置Hadoop接下来我们可以使用putty通过ssh连接到Ubuntu了,将当前用户切换到hduser做如下的操作:
5.1修改~/.bashrc文件首先运行命令查看Java的路径:
$ update-alternatives --config java
There is only one alternative in link group java (providing /usr/bin/java): /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
Nothing to configure.
这里我们需要的JavaHome就是:/usr/lib/jvm/java-7-openjdk-amd64,【注意,这里没有后面的/jre/bin/java部分】 ,然后使用vi编辑~/.bashrc
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"
#HADOOP VARIABLES END
文件的路径为:/usr/local/hadoop/etc/hadoop/hadoop-env.sh,找到对应的行,将内容改为:
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
5.3修改core-site.xml文件在修改这个文件之前,我们需要使用超级用户创建一个目录,并给予hduser该目录的权限:
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
接下来切换回hduser用户,修改配置文件,文件路径:/usr/local/hadoop/etc/hadoop/core-site.xml,使用VI,将配置改为:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
默认情况下,我们只有/usr/local/hadoop/etc/hadoop/mapred-site.xml.template,我们需要先基于这个文件,copy一个新的文件出来,然后再进行修改。
$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
使用VI打开,修改配置如下:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
在修改之前,也是需要切换回超级管理员账户,创建需要用到的目录:
$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
$ sudo chown -R hduser:hadoop /usr/local/hadoop_store
然后切换回来hduser用户,修改配置文件:/usr/local/hadoop/etc/hadoop/hdfs-site.xml,改为: