After the expiry of its life in trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.
如果垃圾相关配置是可用的,通过FS shell移除的文件将不会直接从HDFS移除。相反的,HDFS将它移动到一个回收目录(每个用户在/usr/<username>/.Trash下都拥有它自己的回收站目录)。一个文件只要还在回收站那么就能够快速恢复。
大部分最近删除的文件都将移动到当前的回收站目录(/user/<username>/.Trash/Current),并且在设置好的时间间隔内,HDFS创建对 /user/<username>/.Trash/<date>目录下的文件创建一个检查点并且当老的检查点过期的时候删除他们。查看 了解回收站的检查点。
Following is an example which will show how the files are deleted from HDFS by FS Shell. We created 2 files (test1 & test2) under the directory delete
接下来是我们展示如何通过FS shel删除文件的例子。我们在要删除的目录中创建test1和test2两个文件
$ hadoop fs -mkdir -p delete/test1
$ hadoop fs -mkdir -p delete/test2
$ hadoop fs -ls delete/
Found 2 items
drwxr-xr-x - hadoop hadoop 0 2015-05-08 12:39 delete/test1
drwxr-xr-x - hadoop hadoop 0 2015-05-08 12:40 delete/test2
We are going to remove the file test1. The comment below shows that the file has been moved to Trash directory.
$ hadoop fs -rm -r delete/test1
Moved: hdfs://localhost:8020/user/hadoop/delete/test1 to trash at: hdfs://localhost:8020/user/hadoop/.Trash/Current
now we are going to remove the file with skipTrash option, which will not send the file to Trash.It will be completely removed from HDFS.
$ hadoop fs -rm -r -skipTrash delete/test2
Deleted delete/test2
We can see now that the Trash directory contains only file test1.
$ hadoop fs -ls .Trash/Current/user/hadoop/delete/
Found 1 items\
drwxr-xr-x - hadoop hadoop 0 2015-05-08 12:39 .Trash/Current/user/hadoop/delete/test1
So file test1 goes to Trash and file test2 is deleted permanently.
Decrease Replication Factor (减少副本因子)When the replication factor of a file is reduced, the NameNode selects excess replicas that can be deleted. The next Heartbeat transfers this information to the DataNode. The DataNode then removes the corresponding blocks and the corresponding free space appears in the cluster. Once again, there might be a time delay between the completion of the setReplication API call and the appearance of free space in the cluster.
ReferencesHadoop JavaDoc API.