================= 下面完成配置 ================= 环境变量: [@Hadoop48 ~]$ vi .bashrc export HADOOP_HOME=/home/zhouhh/hadoop-1.0.3 export HADOOP_HOME_WARN_SUPPRESS=1unalias fs &> /dev/null alias fs="hadoop fs" unalias hls &> /dev/null alias hls="fs -ls"export PATH=$PATH:$HADOOP_HOME/bin[@Hadoop48 ~]$ source .bashrc[@Hadoop48 ~]$ cd hadoop-1.0.3 [@Hadoop48 hadoop-1.0.3]$ cd conf [@Hadoop48 conf]$ ls capacity-scheduler.xml fair-scheduler.xml hdfs-default.xml mapred-queue-acls.xml ssl-client.xml.example configuration.xsl hadoop-env.sh hdfs-site.xml mapred-site.xml ssl-server.xml.example core-default.xml hadoop-metrics2.properties log4j.properties masters taskcontroller.cfg core-site.xml hadoop-policy.xml mapred-default.xml slaves 其中几个*default.xml文件是我从相应的src中拷贝过来的,用于配置参考。 配置文件包括环境和配置参数两部分。环境是bin目录下脚本需要的,在hadoop-env.sh 中配置。配置参数在*-site.xml中配置。
masters文件和slaves文件,仅方便用同时管理多台设备的启动和停止,也可以用手动方式来启动:
bin/hadoop-daemon.sh start [namenode | secondarynamenode | datanode | jobtracker | tasktracker]
运行bin/start-dfs.sh,表示是该设备是 NameNode,运行bin/start-mapred.sh表示该设备是 JobTracker。NameNode和JobTracker可以是同一台机器,也可以分开。
bin/start-all.sh, stop-all.sh这两个脚本在1.03中被废弃,被bin/start-dfs.sh ,bin/start-mapred.sh和bin/stop-dfs.sh,bin/stop-mapred.sh所替代。
只读配置文件:src/core/core-default.xml, src/hdfs/hdfs-default.xml, src/mapred/mapred-default.xml
可以用于配置参考。
这三个文件用于实际配置:conf/core-site.xml, conf/hdfs-site.xml,conf/mapred-site.xml
另外,可以通过配置conf/hadoop-env.sh来控制bin目录下执行脚本的变量
配置core-site.xml
可以参考手册和src/core/core-default.xml
[@Hadoop48 conf]$ vi core-site.xml
其中hadoop.mydata.dir 是我自定义的变量,用于作为数据根目录,以后hdfs的dfs.name.dir和dfs.data.dir全配在该分区下面。
这里,config配置文件有几个变量可以用:
${hadoop.home.dir} 和$HADOOP_HOME 一致。${user.name}和用户名一致。
[@Hadoop48 conf]$ vi hdfs-site.xml
<configuration> <property> <name>hadoop.mydata.dir</name> <value>/home/zhouhh/myhadoop</value> <description>A base for other directories.${user.name} </description> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-${user.name}</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://Hadoop48:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> </configuration>[@Hadoop48 conf]$ vi mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>Hadoop48:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.local.dir</name> <value>${hadoop.tmp.dir}/mapred/local</value> <description>The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. </description> </property> <property> <name>mapred.system.dir</name> <value>${hadoop.mydata.dir}/mapred/system</value> <description>The directory where MapReduce stores control files. </description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> <description>The maximum number of map tasks that will be run simultaneously by a task tracker.vary it depending on your hardware </description> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>2</value> <description>The maximum number of reduce tasks that will be run simultaneously by a task tracker.vary it depending on your hardware </description> </property> </configuration>