Hadoop文件系统支持释疑之S3

一、引言
  Hadoop版本提供了对多种文件系统的支持,但是这些文件系统是以何种方式实现的,其实现原理是什么以前并没有深究过。今天正好有人咨询我这个问题:Hadoop对S3的支持原理是什么?特此总结一下。Hadoop支持的文件系统包括:  

  文件系统                URI前缀      hadoop的具体实现类
  Local                    file              fs.LocalFileSystem
  HDFS                    hdfs            hdfs.DistributedFileSystem
  HFTP                      hftp            hdfs.HftpFileSystem
  HSFTP                    hsftp          hdfs.HsftpFileSystem
  HAR                        har            fs.HarFileSystem
  KFS                        kfs            fs.kfs.KosmosFileSystem
  FTP                          ftp            fs.ftp.FTPFileSystem
  S3 (native)              s3n            fs.s3native.NativeS3FileSystem
  S3 (blockbased)      s3      fs.s3.S3FileSystem

二、争议观点
  1.Hadoop对S3文件系统的支持是通过自己实现S3文件系统来做的吗?
   2.Hadoop对S3文件系统的支持是通过S3文件系统接口,实现的对S3文件系统的整合?

三、源码解析

package org.apache.hadoop.fs.s3;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.Closeable;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3.INode.FileType;
import org.jets3t.service.S3Service;
import org.jets3t.service.S3ServiceException;
import org.jets3t.service.impl.rest.httpclient.RestS3Service;
import org.jets3t.service.model.S3Bucket;
import org.jets3t.service.model.S3Object;
import org.jets3t.service.security.AWSCredentials;

class Jets3tFileSystemStore implements FileSystemStore {
 
  private static final String FILE_SYSTEM_NAME = "fs";
  private static final String FILE_SYSTEM_VALUE = "Hadoop";

private static final String FILE_SYSTEM_TYPE_NAME = "fs-type";
  private static final String FILE_SYSTEM_TYPE_VALUE = "block";

private static final String FILE_SYSTEM_VERSION_NAME = "fs-version";
  private static final String FILE_SYSTEM_VERSION_VALUE = "1";
 
  private static final Map<String, String> METADATA =
    new HashMap<String, String>();
 
  static {
    METADATA.put(FILE_SYSTEM_NAME, FILE_SYSTEM_VALUE);
    METADATA.put(FILE_SYSTEM_TYPE_NAME, FILE_SYSTEM_TYPE_VALUE);
    METADATA.put(FILE_SYSTEM_VERSION_NAME, FILE_SYSTEM_VERSION_VALUE);
  }

private static final String PATH_DELIMITER = Path.SEPARATOR;
  private static final String BLOCK_PREFIX = "block_";

private Configuration conf;
 
  private S3Service s3Service;

private S3Bucket bucket;
 
  private int bufferSize;
 
  public void initialize(URI uri, Configuration conf) throws IOException {
   
    this.conf = conf;
   
    S3Credentials s3Credentials = new S3Credentials();
    s3Credentials.initialize(uri, conf);
    try {
      AWSCredentials awsCredentials =
        new AWSCredentials(s3Credentials.getAccessKey(),
            s3Credentials.getSecretAccessKey());
      this.s3Service = new RestS3Service(awsCredentials);
    } catch (S3ServiceException e) {
      if (e.getCause() instanceof IOException) {
        throw (IOException) e.getCause();
      }
      throw new S3Exception(e);
    }
    bucket = new S3Bucket(uri.getHost());

this.bufferSize = conf.getInt("io.file.buffer.size", 4096);
  }

public String getVersion() throws IOException {
    return FILE_SYSTEM_VERSION_VALUE;
  }

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/fc055b6a756289327022cb7f69ea2313.html