Hadoop常见重要命令行操作及命令作用
关于Hadoop
[root@master ~]# hadoop --help
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs
run a generic filesystem user client
version
print the version
jar <jar>
run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath
prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog
get/set the log level for each daemon
or
CLASSNAME
run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
查看版本
[root@master ~]# hadoop version
Hadoop 2.2.0.2.0.6.0-101
Subversion git@github.com:hortonworks/hadoop.git -r b07b2906c36defd389c8b5bd22bebc1bead8115b
Compiled by jenkins on 2014-01-09T05:18Z
Compiled with protoc 2.5.0
From source with checksum 704f1e463ebc4fb89353011407e965
This command was run using /usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-101.jar
运行jar文件
[root@master liguodong]# hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-101.jar pi 10 100
Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
...
Job Finished in 19.715 seconds
Estimated value of Pi is 3.14800000000000000000
检查Hadoop本地库和压缩库的可用性
[root@master liguodong]# hadoop checknative -a
15/06/03 10:28:07 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
15/06/03 10:28:07 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /usr/lib64/libsnappy.so.1
lz4: true revision:43
bzip2: true /lib64/libbz2.so.1
文件归档 Archive
hadoop不适合小文件的存储,小文件本身就占用了很多metadata,就会造成namenode越来越大。
Hadoop Archives (HAR files)是在0.18.0版本中引入的,它的出现就是为了
缓解大量小文件消耗namenode内存的问题。
HAR文件是通过在HDFS上构建一个层次化的文件系统来工作。一个HAR文件是通过hadoop的archive命令来创建,而这个命令实际上也是运行了一个MapReduce任务来将小文件打包成HAR。对于client端来说,使用HAR文件没有任何影响。所有的原始文件都使用har://URL。但在HDFS端它内部的文件数减少了。
通过HAR来读取一个文件并不会比直接从HDFS中读取文件高效,而且实际上可能还会稍微低效一点,因为对每一个HAR文件的访问都需要完成两层读取,index文件的读取和文件本身数据的读取。并且尽管HAR文件可以被用来作为MapReduce job的input,但是并没有特殊的方法来使maps将HAR文件中打包的文件当作一个HDFS文件处理。
创建文件 hadoop archive -archiveName xxx.har -p /src /dest
查看内容 hadoop fs -lsr har:///dest/xxx.har
[root@master liguodong]# hadoop archive
archive -archiveName NAME -p <parent path> <src>* <dest>
[root@master liguodong]# hadoop fs -lsr /liguodong
drwxrwxrwx - hdfs
hdfs
0 2015-05-04 19:40 /liguodong/output
-
rwxrwxrwx 3 hdfs
hdfs
0 2015-05-04 19:40 /liguodong/output/_SUCCESS
-
rwxrwxrwx 3 hdfs
hdfs
23 2015-05-04 19:40 /liguodong/output/part-r-00000
[root@master liguodong]# hadoop archive -archiveName liguodong.har -p /liguodong output /liguodong/har
[root@master liguodong]# hadoop fs -lsr /liguodong
drwxr-xr-x - root
hdfs
0 2015-06-03 11:15 /liguodong/har
drwxr-xr-x - root
hdfs
0 2015-06-03 11:15 /liguodong/har/liguodong.har
-
rw-r--r-- 3 root
hdfs
0 2015-06-03 11:15 /liguodong/har/liguodong.har/_SUCCESS
-
rw-r--r-- 5 root
hdfs
254 2015-06-03 11:15 /liguodong/har/liguodong.har/_index
-
rw-r--r-- 5 root
hdfs
23 2015-06-03 11:15 /liguodong/har/liguodong.har/_masterindex
-
rw-r--r-- 3 root
hdfs
23 2015-06-03 11:15 /liguodong/har/liguodong.har/part-0
drwxrwxrwx - hdfs
hdfs
0 2015-05-04 19:40 /liguodong/output
-
rwxrwxrwx 3 hdfs
hdfs
0 2015-05-04 19:40 /liguodong/output/_SUCCESS
-
rwxrwxrwx 3 hdfs
hdfs
23 2015-05-04 19:40 /liguodong/output/part-r-00000
查看内容
[root@master liguodong]# hadoop fs -lsr har:///liguodong/har/liguodong.har
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x - root hdfs
0 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output
-
rw-r--r-- 3 root hdfs
0 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output/_SUCCESS
-
rw-r--r-- 3 root hdfs
23 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output/part-r-00000
-
--------------------------------------------------------------
[root@master liguodong]# hadoop archive -archiveName liguodong2.har -p /liguodong/output /liguodong/har
[root@master liguodong]# hadoop fs -lsr har:///liguodong/har/liguodong2.har
-
rw-r--r-- 3 root hdfs
0 2015-05-04 19:40 har:///liguodong/har/liguodong2.har/_SUCCESS
-
rw-r--r-- 3 root hdfs
23 2015-05-04 19:40 har:///liguodong/har/liguodong2.har/part-r-00000
关于HDFS
[root@master /]
# hdfs --help
Usage: hdfs [–config confdir] COMMAND
where COMMAND
is one
of:
dfs
run a filesystem command
on the file systems supported
in Hadoop.
namenode -format format
the DFS filesystem
secondarynamenode
run the DFS secondary namenode
namenode
run the DFS namenode
journalnode
run the DFS journalnode
zkfc
run the ZK Failover Controller daemon
datanode
run a DFS datanode
dfsadmin
run a DFS admin client
haadmin
run a DFS HA admin client
fsck
run a DFS filesystem checking utility
balancer
run a cluster balancing utility
jmxget
get JMX exported values
from NameNode
or DataNode.
oiv apply
the offline fsimage viewer
to an fsimage
oev apply
the offline edits viewer
to an edits
file
fetchdt fetch a delegation token
from the NameNode
getconf
get config values
from configuration
groups
get the groups which users belong
to
snapshotDiff diff two snapshots
of a directory
or diff
the
current directory
contents with a snapshot
lsSnapshottableDir
list all snapshottable dirs owned
by the current user
Use -help
to see options
portmap
run a portmap service
nfs3
run an NFS
version 3 gateway
校验检查某个目录是否健康
[root@master liguodong]
# hdfs fsck /liguodong
Connecting to namenode via :
50070
FSCK started by root (auth:SIMPLE) from /
172.23.253.20 for path /liguodong at Wed Jun
03 10:
43:
41 CST
2015
...........Status: HEALTHY
Total size:
1559 B
Total dirs:
7
Total files:
11
Total symlinks:
0
Total blocks (validated):
7 (avg. block size
222 B)
...
The filesystem under path
'/liguodong' is HEALTHY
更加详细的查看命令
[root
@master liguodong]
# hdfs fsck /liguodong -files -blocks