
Hadoop常见重要命令行操作及命令作用
关于Hadoop
[root@master ~]# hadoop --help
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
  fs
run a generic filesystem user client
  version
print the version
  jar <jar>
run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath
prints the class path needed to get the
Hadoop jar and the required libraries
  daemonlog
get/set the log level for each daemon
 or
  CLASSNAME
run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
查看版本
[root@master ~]# hadoop version
Hadoop 2.2.0.2.0.6.0-101
Subversion git@github.com:hortonworks/hadoop.git -r b07b2906c36defd389c8b5bd22bebc1bead8115b
Compiled by jenkins on 2014-01-09T05:18Z
Compiled with protoc 2.5.0
From source with checksum 704f1e463ebc4fb89353011407e965
This command was run using /usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-101.jar
运行jar文件
[root@master liguodong]# hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-101.jar pi 10 100
Number of Maps  = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
...
Job Finished in 19.715 seconds
Estimated value of Pi is 3.14800000000000000000
检查Hadoop本地库和压缩库的可用性
[root@master liguodong]# hadoop checknative -a
15/06/03 10:28:07 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
15/06/03 10:28:07 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib:   true /lib64/libz.so.1
snappy: true /usr/lib64/libsnappy.so.1
lz4:    true revision:43
bzip2:  true /lib64/libbz2.so.1
文件归档 Archive
hadoop不适合小文件的存储,小文件本身就占用了很多metadata,就会造成namenode越来越大。 
Hadoop Archives (HAR files)是在0.18.0版本中引入的,它的出现就是为了 
缓解大量小文件消耗namenode内存的问题。 
HAR文件是通过在HDFS上构建一个层次化的文件系统来工作。一个HAR文件是通过hadoop的archive命令来创建,而这个命令实际上也是运行了一个MapReduce任务来将小文件打包成HAR。对于client端来说,使用HAR文件没有任何影响。所有的原始文件都使用har://URL。但在HDFS端它内部的文件数减少了。 
通过HAR来读取一个文件并不会比直接从HDFS中读取文件高效,而且实际上可能还会稍微低效一点,因为对每一个HAR文件的访问都需要完成两层读取,index文件的读取和文件本身数据的读取。并且尽管HAR文件可以被用来作为MapReduce job的input,但是并没有特殊的方法来使maps将HAR文件中打包的文件当作一个HDFS文件处理。 
创建文件 hadoop archive -archiveName xxx.har -p /src /dest 
查看内容 hadoop fs -lsr har:///dest/xxx.har
[root@master liguodong]# hadoop archive
archive -archiveName NAME -p <parent path> <src>* <dest>
[root@master liguodong]# hadoop fs -lsr /liguodong
drwxrwxrwx   - hdfs
hdfs
0 2015-05-04 19:40 /liguodong/output
-
rwxrwxrwx   3 hdfs
hdfs
0 2015-05-04 19:40 /liguodong/output/_SUCCESS
-
rwxrwxrwx   3 hdfs
hdfs
23 2015-05-04 19:40 /liguodong/output/part-r-00000
[root@master liguodong]# hadoop archive -archiveName liguodong.har -p /liguodong output /liguodong/har
[root@master liguodong]# hadoop fs -lsr /liguodong
drwxr-xr-x   - root
hdfs
0 2015-06-03 11:15 /liguodong/har
drwxr-xr-x   - root
hdfs
0 2015-06-03 11:15 /liguodong/har/liguodong.har
-
rw-r--r--   3 root
hdfs
0 2015-06-03 11:15 /liguodong/har/liguodong.har/_SUCCESS
-
rw-r--r--   5 root
hdfs
254 2015-06-03 11:15 /liguodong/har/liguodong.har/_index
-
rw-r--r--   5 root
hdfs
23 2015-06-03 11:15 /liguodong/har/liguodong.har/_masterindex
-
rw-r--r--   3 root
hdfs
23 2015-06-03 11:15 /liguodong/har/liguodong.har/part-0
drwxrwxrwx   - hdfs
hdfs
0 2015-05-04 19:40 /liguodong/output
-
rwxrwxrwx   3 hdfs
hdfs
0 2015-05-04 19:40 /liguodong/output/_SUCCESS
-
rwxrwxrwx   3 hdfs
hdfs
23 2015-05-04 19:40 /liguodong/output/part-r-00000
查看内容
[root@master liguodong]# hadoop fs -lsr har:///liguodong/har/liguodong.har
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x   - root hdfs
0 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output
-
rw-r--r--   3 root hdfs
0 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output/_SUCCESS
-
rw-r--r--   3 root hdfs
23 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output/part-r-00000
-
--------------------------------------------------------------
[root@master liguodong]# hadoop archive -archiveName liguodong2.har -p /liguodong/output /liguodong/har
[root@master liguodong]# hadoop fs -lsr har:///liguodong/har/liguodong2.har
-
rw-r--r--   3 root hdfs
0 2015-05-04 19:40 har:///liguodong/har/liguodong2.har/_SUCCESS
-
rw-r--r--   3 root hdfs
23 2015-05-04 19:40 har:///liguodong/har/liguodong2.har/part-r-00000
关于HDFS
[root@master /]
# hdfs  --help
Usage: hdfs [–config confdir] COMMAND 
where COMMAND 
is one 
of: 
dfs 
run a filesystem command 
on the file systems supported 
in Hadoop. 
namenode -format format 
the DFS filesystem 
secondarynamenode 
run the DFS secondary namenode 
namenode 
run the DFS namenode 
journalnode 
run the DFS journalnode 
zkfc 
run the ZK Failover Controller daemon 
datanode 
run a DFS datanode 
dfsadmin 
run a DFS admin client 
haadmin 
run a DFS HA admin client 
fsck 
run a DFS filesystem checking utility 
balancer 
run a cluster balancing utility 
jmxget 
get JMX exported values 
from NameNode 
or DataNode. 
oiv apply 
the offline fsimage viewer 
to an fsimage 
oev apply 
the offline edits viewer 
to an edits 
file 
fetchdt fetch a delegation token 
from the NameNode 
getconf 
get config values 
from configuration 
groups 
get the groups which users belong 
to 
snapshotDiff diff two snapshots 
of a directory 
or diff 
the 
current directory 
contents with a snapshot 
lsSnapshottableDir 
list all snapshottable dirs owned 
by the current user 
Use -help 
to see options 
portmap 
run a portmap service 
nfs3 
run an NFS 
version 3 gateway
校验检查某个目录是否健康
[root@master liguodong]
# hdfs fsck /liguodong
Connecting to namenode via :
50070
FSCK started by root (auth:SIMPLE) from /
172.23.253.20 for path /liguodong at Wed Jun 
03 10:
43:
41 CST 
2015
...........Status: HEALTHY
 Total size:    
1559 B
 Total dirs:    
7
 Total files:   
11
 Total symlinks:
0
 Total blocks (validated):
7 (avg. block size 
222 B)
...
The filesystem under path 
'/liguodong' is HEALTHY
更加详细的查看命令
[root
@master liguodong]
# hdfs fsck /liguodong -files -blocks