Hadoop集群日常运维(2)

[jediael@master ~]$ hadoop fsck -files
Usage: DFSck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]
        <path>  start checking from this path
        -move  move corrupted files to /lost+found
        -delete delete corrupted files
        -files  print out files being checked
        -openforwrite  print out files opened for write
        -blocks print out block report
        -locations      print out locations for every block
        -racks  print out network topology for data-node locations
                By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually  tagged CORRUPT or HEALTHY depending on their block allocation status
Generic options supported are
-conf <configuration file>    specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

详细解释请见《hadoop权威指南》P376

(四)均衡器
随时时间推移,各个datanode上的块分布来越来越不均衡,这将降低MR的本地性,导致部分datanode相对更加繁忙。

均衡器是一个hadoop守护进程,它将块从忙碌的DN移动相对空闲的DN,同时坚持块复本放置策略,将复本分散到不同的机器、机架。

建议定期执行均衡器,如每天或者每周。

(1)通过以下命令运行均衡器

[jediael@master log]$ start-balancer.sh
starting balancer, logging to /var/log/hadoop/hadoop-jediael-balancer-master.out

查看日志如下:

[jediael@master hadoop]$ pwd
/var/log/hadoop
[jediael@master hadoop]$ ls
hadoop-jediael-balancer-master.log  hadoop-jediael-balancer-master.out
[jediael@master hadoop]$ cat hadoop-jediael-balancer-master.log
2015-03-01 21:08:08,027 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.251.0.197:50010
2015-03-01 21:08:08,028 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.171.94.155:50010
2015-03-01 21:08:08,028 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: 0 over utilized nodes:
2015-03-01 21:08:08,028 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: 0 under utilized nodes:

(2)均衡器将每个DN的使用率与整个集群的使用率接近,这个“接近”是通过-threashold参数指定的,默认是10%。

(3)不同节点之间复制数据的带宽是受限的,默认是1MB/s,可以通过hdfs-site.xml文件中的dfs.balance.bandwithPerSec属性指定(单位是字节)。

(五)datanode块扫描器

每个datanode均会运行一个块扫描器,定期检测本节点上的所有块,若发现存在错误(如检验和错误),则通知namenode,然后由namenode发起数据重新创建复本或者修复。

扫描周期由dfs.datanode.scan.period.hours指定,默认为三周(504小时)。

通过地址以下地址查看扫描信息:

(1):50075/blockScannerReport

列出总体的检测情况

Total Blocks                :  1919
Verified in last hour        :      4
Verified in last day        :    170
Verified in last week        :    535
Verified in last four weeks  :    535
Verified in SCAN_PERIOD      :    535
Not yet verified            :  1384
Verified since restart      :    559
Scans since restart          :    91
Scan errors since restart    :      0
Transient scan errors        :      0
Current scan rate limit KBps :  1024
Progress this period        :    113%
Time left in cur period      :  97.14%

(2):50075/blockScannerReport?listblocks

列出所有的块及最新验证状态

blk_8482244195562050998_3796 : status : ok    type : none  scan time : 0              not yet verified

blk_3985450615149803606_7952 : status : ok    type : none  scan time : 0              not yet verified

尚未验证的情况如上。各字段意义可参考权威指南P379

Hadoop权威指南(中文版-带目录索引)PDF 

Hadoop权威指南(中文第2版)PDF

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/4ad533bd9481b2101dbf089d3b68b558.html