Hadoop常见重要命令行操作及命令作用

Hadoop常见重要命令行操作及命令作用 关于Hadoop [root@master ~]# hadoop --help Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: fs run a generic filesystem user client version print the version jar <jar> run a jar file checknative [-a|-h] check native hadoop and compression libraries availability distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive classpath prints the class path needed to get the Hadoop jar and the required libraries daemonlog get/set the log level for each daemon or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters. 查看版本 [root@master ~]# hadoop version Hadoop 2.2.0.2.0.6.0-101 Subversion git@github.com:hortonworks/hadoop.git -r b07b2906c36defd389c8b5bd22bebc1bead8115b Compiled by jenkins on 2014-01-09T05:18Z Compiled with protoc 2.5.0 From source with checksum 704f1e463ebc4fb89353011407e965 This command was run using /usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-101.jar 运行jar文件 [root@master liguodong]# hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-101.jar pi 10 100 Number of Maps = 10 Samples per Map = 100 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 ... Job Finished in 19.715 seconds Estimated value of Pi is 3.14800000000000000000 检查Hadoop本地库和压缩库的可用性 [root@master liguodong]# hadoop checknative -a 15/06/03 10:28:07 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native 15/06/03 10:28:07 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 zlib: true /lib64/libz.so.1 snappy: true /usr/lib64/libsnappy.so.1 lz4: true revision:43 bzip2: true /lib64/libbz2.so.1 文件归档 Archive

hadoop不适合小文件的存储,小文件本身就占用了很多metadata,就会造成namenode越来越大。
Hadoop Archives (HAR files)是在0.18.0版本中引入的,它的出现就是为了
缓解大量小文件消耗namenode内存的问题。
HAR文件是通过在HDFS上构建一个层次化的文件系统来工作。一个HAR文件是通过hadoop的archive命令来创建,而这个命令实际上也是运行了一个MapReduce任务来将小文件打包成HAR。对于client端来说,使用HAR文件没有任何影响。所有的原始文件都使用har://URL。但在HDFS端它内部的文件数减少了。
通过HAR来读取一个文件并不会比直接从HDFS中读取文件高效,而且实际上可能还会稍微低效一点,因为对每一个HAR文件的访问都需要完成两层读取,index文件的读取和文件本身数据的读取。并且尽管HAR文件可以被用来作为MapReduce job的input,但是并没有特殊的方法来使maps将HAR文件中打包的文件当作一个HDFS文件处理。
创建文件 hadoop archive -archiveName xxx.har -p /src /dest
查看内容 hadoop fs -lsr har:///dest/xxx.har

[root@master liguodong]# hadoop archive archive -archiveName NAME -p <parent path> <src>* <dest> [root@master liguodong]# hadoop fs -lsr /liguodong drwxrwxrwx - hdfs hdfs 0 2015-05-04 19:40 /liguodong/output -rwxrwxrwx 3 hdfs hdfs 0 2015-05-04 19:40 /liguodong/output/_SUCCESS -rwxrwxrwx 3 hdfs hdfs 23 2015-05-04 19:40 /liguodong/output/part-r-00000 [root@master liguodong]# hadoop archive -archiveName liguodong.har -p /liguodong output /liguodong/har [root@master liguodong]# hadoop fs -lsr /liguodong drwxr-xr-x - root hdfs 0 2015-06-03 11:15 /liguodong/har drwxr-xr-x - root hdfs 0 2015-06-03 11:15 /liguodong/har/liguodong.har -rw-r--r-- 3 root hdfs 0 2015-06-03 11:15 /liguodong/har/liguodong.har/_SUCCESS -rw-r--r-- 5 root hdfs 254 2015-06-03 11:15 /liguodong/har/liguodong.har/_index -rw-r--r-- 5 root hdfs 23 2015-06-03 11:15 /liguodong/har/liguodong.har/_masterindex -rw-r--r-- 3 root hdfs 23 2015-06-03 11:15 /liguodong/har/liguodong.har/part-0 drwxrwxrwx - hdfs hdfs 0 2015-05-04 19:40 /liguodong/output -rwxrwxrwx 3 hdfs hdfs 0 2015-05-04 19:40 /liguodong/output/_SUCCESS -rwxrwxrwx 3 hdfs hdfs 23 2015-05-04 19:40 /liguodong/output/part-r-00000 查看内容 [root@master liguodong]# hadoop fs -lsr har:///liguodong/har/liguodong.har lsr: DEPRECATED: Please use 'ls -R' instead. drwxr-xr-x - root hdfs 0 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output -rw-r--r-- 3 root hdfs 0 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output/_SUCCESS -rw-r--r-- 3 root hdfs 23 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output/part-r-00000 --------------------------------------------------------------- [root@master liguodong]# hadoop archive -archiveName liguodong2.har -p /liguodong/output /liguodong/har [root@master liguodong]# hadoop fs -lsr har:///liguodong/har/liguodong2.har -rw-r--r-- 3 root hdfs 0 2015-05-04 19:40 har:///liguodong/har/liguodong2.har/_SUCCESS -rw-r--r-- 3 root hdfs 23 2015-05-04 19:40 har:///liguodong/har/liguodong2.har/part-r-00000 关于HDFS [root@master /]# hdfs --help Usage: hdfs [–config confdir] COMMAND where COMMAND is one of: dfs run a filesystem command on the file systems supported in Hadoop. namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode journalnode run the DFS journalnode zkfc run the ZK Failover Controller daemon datanode run a DFS datanode dfsadmin run a DFS admin client haadmin run a DFS HA admin client fsck run a DFS filesystem checking utility balancer run a cluster balancing utility jmxget get JMX exported values from NameNode or DataNode. oiv apply the offline fsimage viewer to an fsimage oev apply the offline edits viewer to an edits file fetchdt fetch a delegation token from the NameNode getconf get config values from configuration groups get the groups which users belong to snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot lsSnapshottableDir list all snapshottable dirs owned by the current user Use -help to see options portmap run a portmap service nfs3 run an NFS version 3 gateway 校验检查某个目录是否健康 [root@master liguodong]# hdfs fsck /liguodong Connecting to namenode via :50070 FSCK started by root (auth:SIMPLE) from /172.23.253.20 for path /liguodong at Wed Jun 03 10:43:41 CST 2015 ...........Status: HEALTHY Total size: 1559 B Total dirs: 7 Total files: 11 Total symlinks: 0 Total blocks (validated): 7 (avg. block size 222 B) ... The filesystem under path '/liguodong' is HEALTHY 更加详细的查看命令 [root@master liguodong]# hdfs fsck /liguodong -files -blocks

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/2685ea5383b5310bb37ccf0b77ea28dd.html