2.查看Hadoop文件系统中根目录下in子目录中所有内容
[grid@h1 hadoop-0.20.2]$ bin/hadoop dfs -ls 查看hadoop文件系统根目录内容
Found 1 items
drwxr-xr-x - grid supergroup 0 2012-09-17 19:44 /user/grid/in 只有一个in目录
[grid@h1 hadoop-0.20.2]$ bin/hadoop dfs -ls ./in/* 查看hadoop文件系统中根目录下in子目录内容
-rw-r--r-- 2 grid supergroup 17 2012-09-17 19:44 /user/grid/in/test1.txt 有 2个 文件
-rw-r--r-- 2 grid supergroup 12 2012-09-17 19:44 /user/grid/in/test2.txt
小结:Hadoop没有当前目录的概念,当然也不能进入in目录,更没有cd命令。所以查看时必须输入目录路径
3.测试map_reduce系统是否可以正常工作,map reduce 采用“就近分配节点”原则执行数据
jar包:/home/grid/hadoop-0.20.2/hadoop-0.20.2-examples.jar 安装hadoop时从源代码拷贝过来的,我们可以直接使用
[grid@h1 hadoop-0.20.2]$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount in out 把这个jar包里wordcount功能提交给map_reduce当做一个作业运行,测试map_reduce系统是否可以正常工作,in 输入数据目录(数据源) out 输出数据目录(即输出到哪里)
12/09/17 20:39:06 INFO input.FileInputFormat: Total input paths to process : 2
12/09/17 20:39:07 INFO mapred.JobClient: Running job: job_201209172027_0002 运行作业号“2012年9月17日1856不是时间”
12/09/17 20:39:08 INFO mapred.JobClient: map 0% reduce 0%
12/09/17 20:40:34 INFO mapred.JobClient: map 50% reduce 0%
12/09/17 20:40:49 INFO mapred.JobClient: map 100% reduce 0% map reduce进度
12/09/17 20:41:02 INFO mapred.JobClient: map 100% reduce 100%
12/09/17 20:41:04 INFO mapred.JobClient: Job complete: job_201209172027_0002 作业完成
12/09/17 20:41:04 INFO mapred.JobClient: Counters: 17
12/09/17 20:41:04 INFO mapred.JobClient: Job Counters 作业计数器
12/09/17 20:41:04 INFO mapred.JobClient: Launched reduce tasks=1 启动reduce任务1个
12/09/17 20:41:04 INFO mapred.JobClient: Launched map tasks=3 启动map任务3个
12/09/17 20:41:04 INFO mapred.JobClient: Data-local map tasks=3
12/09/17 20:41:04 INFO mapred.JobClient: FileSystemCounters 文件系统计数器
12/09/17 20:41:04 INFO mapred.JobClient: FILE_BYTES_READ=59
12/09/17 20:41:04 INFO mapred.JobClient: HDFS_BYTES_READ=29
12/09/17 20:41:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=188
12/09/17 20:41:04 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29
12/09/17 20:41:04 INFO mapred.JobClient: Map-Reduce Framework map_reduce框架
12/09/17 20:41:04 INFO mapred.JobClient: Reduce input groups=3 reduce输入组3
12/09/17 20:41:04 INFO mapred.JobClient: Combine output records=4 合并输出记录4
12/09/17 20:41:04 INFO mapred.JobClient: Map input records=2 map输入记录2
12/09/17 20:41:04 INFO mapred.JobClient: Reduce shuffle bytes=65 reduce shuffle=预处理 减少计算量 算的更快
12/09/17 20:41:04 INFO mapred.JobClient: Reduce output records=3 reduce输出记录3
12/09/17 20:41:04 INFO mapred.JobClient: Spilled Records=8 溢出记录8
12/09/17 20:41:04 INFO mapred.JobClient: Map output bytes=45 map输出字节45
12/09/17 20:41:04 INFO mapred.JobClient: Combine input records=4 合并输入记录4
12/09/17 20:41:04 INFO mapred.JobClient: Map output records=4 map输出记录4
12/09/17 20:41:04 INFO mapred.JobClient: Reduce input records=4 reduce输入记录4
浏览器::50030/jobtracker.jsp 这里有job更详细的信息
小结:报错信息【org.apache.hadoop.util.DiskChecker$DiskErrorException: Could no find taskTracker/jobcache/job_201209171856_0001/attempt_201209171856_0001_m_000000_0/output/file.out.index in any of the configured local directories】 请执行stop-all.sh -> start-all.sh 重启hadoop所有进程