OpenStack环境下Hadoop2.2.0环境搭建(4)

$ hdfs namenode -format

f. 启动Hadoop

@hdp-server-01 #在不同vm上启动的服务不同,根据划分的角色

$cd $YARN_HOME
$sbin/hadoop-daemon.sh  --script hdfs start namenode  # 启动namenode
$sbin/hadoop-daemon.sh --script hdfs start datanode  # 启动datanode
$sbin/yarndaemon.shstart nodemanager  #启动nodemanager
$sbin/yarn-daemon.sh  start resourcemanager # 启动resourcemanager
$sbin/yarn-daemon.shstart proxyserver  #启动web App proxy
$sbin/mr-jobhistory-daemon.sh  start historyserver

jps查看
$ jps
8770 ResourceManager
11609 Jps
8644 NodeManager
9071 JobHistoryServer
8479 NameNode
9000 WebAppProxyServer
8552 DataNode

@hdp-server-02
@hdp-server-03

$cd $YARN_HOME
$sbin/yarndaemon.shstart nodemanager  # 启动nodemanager
$sbin/hadoop-daemon.sh  --script hdfs start datanode  # 启动datanode

jps查看
$ jps
6691 NodeManager
9089 Jps
6787 DataNode

至此,集群搭建完毕,跑一个测试用例试试:

cd $YARN_HOME
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 1000

这是mongodb蒙特卡洛算法计算圆周率的测试用例,pi后跟的两个数字分别表示使用多少个map以及计算的精度。结果如下:

Number of Maps  = 10
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
13/12/22 17:50:42 INFO client.RMProxy: Connecting to ResourceManager at hdp-server-01/10.0.0.225:8032
13/12/22 17:50:43 INFO input.FileInputFormat: Total input paths to process : 10
13/12/22 17:50:43 INFO mapreduce.JobSubmitter: number of splits:10
13/12/22 17:50:43 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
13/12/22 17:50:43 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/12/22 17:50:43 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/12/22 17:50:43 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/12/22 17:50:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1387700249346_0004
13/12/22 17:50:44 INFO impl.YarnClientImpl: Submitted application application_1387700249346_0004 to ResourceManager at hdp-server-01/10.0.0.225:8032
13/12/22 17:50:44 INFO mapreduce.Job: The url to track the job: :8888/proxy/application_1387700249346_0004/
13/12/22 17:50:44 INFO mapreduce.Job: Running job: job_1387700249346_0004
13/12/22 17:50:53 INFO mapreduce.Job: Job job_1387700249346_0004 running in uber mode : false
13/12/22 17:50:53 INFO mapreduce.Job:  map 0% reduce 0%
13/12/22 17:51:03 INFO mapreduce.Job:  map 40% reduce 0%
13/12/22 17:51:13 INFO mapreduce.Job:  map 90% reduce 0%
13/12/22 17:51:14 INFO mapreduce.Job:  map 100% reduce 0%
13/12/22 17:51:15 INFO mapreduce.Job:  map 100% reduce 100%
13/12/22 17:51:16 INFO mapreduce.Job: Job job_1387700249346_0004 completed successfully
13/12/22 17:51:16 INFO mapreduce.Job: Counters: 43
 File System Counters
  FILE: Number of bytes read=226
  FILE: Number of bytes written=878638
  FILE: Number of read operations=0
  FILE: Number of large read operations=0
  FILE: Number of write operations=0
  HDFS: Number of bytes read=2680
  HDFS: Number of bytes written=215
  HDFS: Number of read operations=43
  HDFS: Number of large read operations=0
  HDFS: Number of write operations=3
 Job Counters
  Launched map tasks=10
  Launched reduce tasks=1
  Data-local map tasks=10
  Total time spent by all maps in occupied slots (ms)=142127
  Total time spent by all reduces in occupied slots (ms)=8333
 Map-Reduce Framework
  Map input records=10
  Map output records=20
  Map output bytes=180
  Map output materialized bytes=280
  Input split bytes=1500
  Combine input records=0
  Combine output records=0
  Reduce input groups=2
  Reduce shuffle bytes=280
  Reduce input records=20
  Reduce output records=0
  Spilled Records=40
  Shuffled Maps =10
  Failed Shuffles=0
  Merged Map outputs=10
  GC time elapsed (ms)=2606
  CPU time spent (ms)=11090
  Physical memory (bytes) snapshot=2605563904
  Virtual memory (bytes) snapshot=11336945664
  Total committed heap usage (bytes)=2184183808
 Shuffle Errors
  BAD_ID=0
  CONNECTION=0
  IO_ERROR=0
  WRONG_LENGTH=0
  WRONG_MAP=0
  WRONG_REDUCE=0
 File Input Format Counters
  Bytes Read=1180
 File Output Format Counters
  Bytes Written=97
Job Finished in 34.098 seconds
Estimated value of Pi is 3.14080000000000000000

总结:

1. 在OpenStack环境启动的虚拟机中搭建Hadoop与物理机搭建并无太大不同,但需要注意虚拟机获取到的IP地址,用openstack分配的浮动ip(Floating ip)往往不能使用。因为浮动ip是由nova-network设置,用于nat转发的,虚拟机自身并不知道这个地址。

2. 集群中使用的虚拟机最好是同样的操作系统,这样可以使用编译好的文件,因为在Hadoop2.2.0框架中 hdfs不存在Master节点,因此每个节点的配置文件都是相同的,故可以先发起一台虚拟机,安装配置完之后将其做成镜像,后续可以起多个节点,区别在于启动的服务不同。

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/c705bcd81b9ac4b23445faa7cd5ace3b.html