环境:系统CentOS6.6;Hadoop版本:1.0.3;Java运行环境:jdk1.6
单节点配置过程:
1.配置系统ssh:hadoop在运行过程中会用访问ssh服务,将ssh服务设置成无密码访问,这样hadoop在访问ssh服务的时候就不需要人工手动输入密码了:
detail:
step 1:生成密钥
[hjchaw@localhost ~]$ ssh-keygen -t rsa -P ""
[hjchaw@localhost ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
step 2:测试ssh ,如果ssh成功连接,说明配置ssh配置成功
[hjchaw@localhost ~]$ ssh localhost
如果ssh访问还提示输入密码:一般是.ssh路径访问权限问题,把权限设置成700,配置的时候注意。
2.hadoop配置过程:
step1:hadoop-env.xml配置,修改其中的JAVA_HOME:如:JAVA_HOME=/usr/local/jdk
step2:core-site.xml文件配置:
<configuration>
NO1:配置hadoop数据存放路径
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hjchaw/hadoop-datastore/hadoop-${user.name}</value>
<description>The name of the default file system. Either the
literal string "local" or a host:port for NDFS.
</description>
<final>true</final>
</property>
NO2:设置fs名称
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. Either the
literal string "local" or a host:port for NDFS.
</description>
<final>true</final>
</property>
</configuration>
step3:配置hdfs-site.xml
<!-- file system properties -->
<property>
<name>dfs.name.dir</name>
<value>${hadoop.tmp.dir}/dfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table. If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
<final>true</final>
</property>
<property>
<name>dfs.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<final>true</final>
</property>
step4: mapred-site.xml 配置
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
以上几步是hadoop单节点,伪分布式配置。
3.hadoop启动:
可以将hadoop/bin设置到PATH路径中
setup1:格式化文件系统:
[hjchaw@localhost bin]$ hadoop namenode -format
12/05/27 04:25:19 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localhost.localdomain/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012
************************************************************/
12/05/27 04:25:19 INFO util.GSet: VM type = 32-bit
12/05/27 04:25:19 INFO util.GSet: 2% max memory = 19.33375 MB
12/05/27 04:25:19 INFO util.GSet: capacity = 2^22 = 4194304 entries
12/05/27 04:25:19 INFO util.GSet: recommended=4194304, actual=4194304
12/05/27 04:25:20 INFO namenode.FSNamesystem: fsOwner=hjchaw
12/05/27 04:25:20 INFO namenode.FSNamesystem: supergroup=supergroup
12/05/27 04:25:20 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/05/27 04:25:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/05/27 04:25:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/05/27 04:25:20 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/05/27 04:25:21 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/05/27 04:25:21 INFO common.Storage: Storage directory /home/hjchaw/hadoop-datastore/hadoop-hjchaw/dfs/name has been successfully formatted.
12/05/27 04:25:21 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
************************************************************/
step2:启动hadoop:
[hjchaw@localhost bin]$ start-all.sh
starting namenode, logging to /opt/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hjchaw-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /opt/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hjchaw-datanode-localhost.localdomain.out
localhost: starting secondarynamenode, logging to /opt/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hjchaw-secondarynamenode-localhost.localdomain.out
starting jobtracker, logging to /opt/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hjchaw-jobtracker-localhost.localdomain.out
localhost: starting tasktracker, logging to /opt/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hjchaw-tasktracker-localhost.localdomain.out
如果看到以上结果信息,那么configuration is OK,now!