VMware上CentOS7.0+Hadoop3.1伪分布式搭建(3)

jps
1267 NameNode
1380 DataNode
1559 Jps
1528 SecondaryNameNode

7、 HDFS上测试创建目录、上传、下载文件

HDFS上创建目录
hdfs dfs -mkdir /demo

上传本地文件到HDFS上
hdfs dfs -put ${HADOOP_HOME}/etc/hadoop/core-site.xml /demo

 
读取HDFS上的文件内容
hdfs dfs -cat /demo/core-site.xml
 
从HDFS上下载文件到本地
hdfs dfs -get /demo/core-site.xml

查看目录
$ hdfs dfs -ls /demo

也可以使用hadoop fs -mkdir /mydata这样的命令

VMware上CentOS7.0+Hadoop3.1伪分布式搭建

九、配置和启动YARN


1、 配置mapred-site.xml

vi ${HADOOP_HOME}/etc/hadoop/mapred-site.xml

添加配置如下:

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

指定mapreduce运行在yarn框架上。


2、 配置yarn-site.xml
vi ${HADOOP_HOME}/etc/hadoop/yarn-site.xml
添加配置如下:

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>bigdata-senior01.home.com</value>
 </property>

   
    yarn.nodemanager.aux-services配置了yarn的默认混洗方式,选择为mapreduce的默认混洗算法。

    yarn.resourcemanager.hostname指定了Resourcemanager运行在哪个节点上。


3、 启动Resourcemanager

 ${HADOOP_HOME}/sbin/yarn-daemon.sh start resourcemanager
系统会告警,建议采用:yarn --daemon start resourcemanager

4、 启动nodemanager

[hadoop@bigdata-senior01 hadoop-3.1.0]$ ${HADOOP_HOME}/sbin/yarn-daemon.sh start nodemanager
系统会告警,建议采用:yarn --daemon start nodemanager


5、 查看是否启动成功
[hadoop@localhost sbin]$ jps
1395 DataNode
1507 SecondaryNameNode
2150 Jps
2075 NodeManager
1292 NameNode
1628 ResourceManager


6、YARN的Web页面

YARN的Web客户端端口号是8088,通过:8088/可以查看。
如果防火墙没有关闭,还需要添加端口:
firewall-cmd --zone=public --add-port=8088/tcp --permanent

HDFS的web页面:
:9870/
firewall-cmd --zone=public --add-port=9870/tcp --permanent

#注意这里,hadoop3.0以前hdfs的端口号不是9870

以后类似需要端口访问的web管理页面,都需要在防火墙里添加端口,也可以直接关闭防火墙。

移除端口:firewall-cmd --zone=public --remove-port=8088/tcp --permanent

VMware上CentOS7.0+Hadoop3.1伪分布式搭建


十、运行MapReduce Job
在Hadoop的share目录里,自带了一些jar包,里面带有一些mapreduce实例小例子,位置在$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar,可以运行这些例子体验刚搭建好的Hadoop平台,我们这里来运行最经典的WordCount实例。版本不同,这个jar包的名字也有点区别。
1、添加类库路径

因为运行的是hadoop自带的例子,所以例子里的类库要加入
编辑 Hadoop 安装目录下 etc/hadoop/mapred-site.xml 文件,在  <configuration> 标签和 </configuration> 标签之间添加如下配置:
<property>
  <description>CLASSPATH for MR applications. A comma-separated list
  of CLASSPATH entries. If mapreduce.application.framework is set then this
  must specify the appropriate classpath for that archive, and the name of
  the archive must be present in the classpath.
  If mapreduce.app-submission.cross-platform is false, platform-specific
  environment vairable expansion syntax would be used to construct the default
  CLASSPATH entries.
  For Linux:
  $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,
  $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*.
  For Windows:
  %HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,
  %HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*.
  If mapreduce.app-submission.cross-platform is true, platform-agnostic default
  CLASSPATH for MR applications would be used:
  {{HADOOP_MAPRED_HOME}}/share/hadoop/mapreduce/*,
  {{HADOOP_MAPRED_HOME}}/share/hadoop/mapreduce/lib/*
  Parameter expansion marker will be replaced by NodeManager on container
  launch based on the underlying OS accordingly.
  </description>
  <name>mapreduce.application.classpath</name>
  <value>/opt/modules/hadoop-3.1.0/share/hadoop/mapreduce/*, /opt/modules/hadoop-3.1.0/share/hadoop/mapreduce/lib-examples/*</value>
</property>
注意,这一点非常重要,必须填写完整的路径,即必须是绝对路径,不能包含变量。

2、内存配置:
-----
安装虚拟机的时候内存原本是1G,运行job一直出错,各种修改配置文件里的内存都不行,最后把虚拟机内存调整成2G以运行成功。
关于内存这块,要非常熟悉hadoop的各类配置才好下手,新手入门还是先不折腾。
-----

 3、 创建测试用的Input文件

创建输入目录:
hdfs dfs -mkdir -p /wordcountdemo/input

创建原始文件:

在本地/opt/data目录创建一个文件mydata.input,内容如下:
cat /opt/data/mydata.input
abc def kkk
abc kkk sss
ddd abc sss
abc abc sss

将wc.input文件上传到HDFS的/wordcountdemo/input目录中:

hdfs dfs -put /opt/data/mydata.input /wordcountdemo/input

运行WordCount MapReduce Job

yarn jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar wordcount /wordcountdemo/input /wordcountdemo/output

2018-05-29 22:18:34,201 INFO client.RMProxy: Connecting to ResourceManager at bigdata-senior01.home.com/192.168.31.10:8032
2018-05-29 22:18:35,314 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1527603486527_0001
2018-05-29 22:18:36,437 INFO input.FileInputFormat: Total input files to process : 1
2018-05-29 22:18:37,402 INFO mapreduce.JobSubmitter: number of splits:1
2018-05-29 22:18:37,472 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-05-29 22:18:37,834 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1527603486527_0001
2018-05-29 22:18:37,845 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-05-29 22:18:38,124 INFO conf.Configuration: resource-types.xml not found
2018-05-29 22:18:38,124 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-05-29 22:18:38,671 INFO impl.YarnClientImpl: Submitted application application_1527603486527_0001
2018-05-29 22:18:38,737 INFO mapreduce.Job: The url to track the job: :8088/proxy/application_1527603486527_0001/
2018-05-29 22:18:38,738 INFO mapreduce.Job: Running job: job_1527603486527_0001
2018-05-29 22:18:51,002 INFO mapreduce.Job: Job job_1527603486527_0001 running in uber mode : false
2018-05-29 22:18:51,003 INFO mapreduce.Job:  map 0% reduce 0%
2018-05-29 22:18:57,124 INFO mapreduce.Job:  map 100% reduce 0%
2018-05-29 22:19:04,187 INFO mapreduce.Job:  map 100% reduce 100%
2018-05-29 22:19:06,209 INFO mapreduce.Job: Job job_1527603486527_0001 completed successfully
2018-05-29 22:19:06,363 INFO mapreduce.Job: Counters: 53
        File System Counters
                FILE: Number of bytes read=94
                FILE: Number of bytes written=425699
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=202
                HDFS: Number of bytes written=60
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=4455
                Total time spent by all reduces in occupied slots (ms)=4530
                Total time spent by all map tasks (ms)=4455
                Total time spent by all reduce tasks (ms)=4530
                Total vcore-milliseconds taken by all map tasks=4455
                Total vcore-milliseconds taken by all reduce tasks=4530
                Total megabyte-milliseconds taken by all map tasks=4561920
                Total megabyte-milliseconds taken by all reduce tasks=4638720
        Map-Reduce Framework
                Map input records=4
                Map output records=11
                Map output bytes=115
                Map output materialized bytes=94
                Input split bytes=131
                Combine input records=11
                Combine output records=7
                Reduce input groups=7
                Reduce shuffle bytes=94
                Reduce input records=7
                Reduce output records=7
                Spilled Records=14
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=172
                CPU time spent (ms)=1230
                Physical memory (bytes) snapshot=388255744
                Virtual memory (bytes) snapshot=5476073472
                Total committed heap usage (bytes)=165810176
                Peak Map Physical memory (bytes)=242692096
                Peak Map Virtual memory (bytes)=2733621248
                Peak Reduce Physical memory (bytes)=145563648
                Peak Reduce Virtual memory (bytes)=2742452224
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=71
        File Output Format Counters
                Bytes Written=60

查看输出结果目录

hdfs dfs -ls /wordcountdemo/output
--用命令:hadoop fs -cat /wordcountdemo/output/part-r-00000,也可以,上面也一样
Found 2 items
-rw-r--r--  1 hadoop supergroup          0 2018-05-29 22:19 /wordcountdemo/output/_SUCCESS
-rw-r--r--  1 hadoop supergroup        60 2018-05-29 22:19 /wordcountdemo/output/part-r-00000

    output目录中有两个文件,_SUCCESS文件是空文件,有这个文件说明Job执行成功。

    part-r-00000文件是结果文件,其中-r-说明这个文件是Reduce阶段产生的结果,mapreduce程序执行时,可以没有reduce阶段,但是肯定会有map阶段,如果没有reduce阶段这个地方有是-m-。

    一个reduce会产生一个part-r-开头的文件。

    查看输出文件内容。
hdfs dfs -cat /wordcountdemo/output/part-r-00000

在虚拟机上,因为内存不足这里经常会出现的情况就是内存溢出,然后hadoop的运行容器被kill。

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/387e0b236ec03defa182a2270f5cf93f.html