Namenode热备份机制(2)

为了防止诨调用rollFSImage(),系统引入了状态CheckpointStates.UPLOAD_DONE。原因如下:

rollFSImage是将fsimage.ckpt、edits.new这两个文件分别roll成fsimage和edits。只有到达UPLOAD_DONE状态才会有这两个文件。在rollFSImage开头会验证当前checkpoint状态是否为UPLOAD_DONE。

NN上Checkpoint热备份过程中状态的变化图:

Namenode热备份机制

在‘NN上的checkpoint状态图’中rollFSImage可能出现namenode故障,导致fsimage.ckpt-->fsimage、edits.new-->edits失败。在从fsimage文件读取目录结构时,会调用recoverInterruptedCheckpoint方法对这种情况进行检测并恢复:

boolean recoverInterruptedCheckpoint(StorageDirectory nameSD,
                                      StorageDirectory editsSD)
                                      throws IOException {
    boolean needToSave = false;
    File curFile = getImageFile(nameSD, NameNodeFile.IMAGE);
    File ckptFile = getImageFile(nameSD, NameNodeFile.IMAGE_NEW);
    //此时已经是UPLOAD_DONE
    //
    // If we were in the midst of a checkpoint
    //
    if (ckptFile.exists()) {
      //fsimage.ckpt存在,rollFSImage没成功
      needToSave = true;
      if (getImageFile(editsSD, NameNodeFile.EDITS_NEW).exists()) {
     //edits.new存在,说明没有执行到rollFSImage,所以不确定fsimage.ckpt是否上传成功,废弃fsimage.ckpt
        //
        // checkpointing migth have uploaded a new
        // merged image, but we discard it here because we are
        // not sure whether the entire merged image was uploaded
        // before the namenode crashed.
        //
        if (!ckptFile.delete()) {
          throw new IOException("Unable to delete " + ckptFile);
        }
      } else {//edits.new文件不存在,在rollFSImage中是先edits.new-->edits,
        //此处fsimage.ckpt文件存在说明没将fsimage.ckpt-->fsimage,所以
    //只需对fsimage.ckpt进行再次重命名即可
        // checkpointing was in progress when the namenode
        // shutdown. The fsimage.ckpt was created and the edits.new
        // file was moved to edits. We complete that checkpoint by
        // moving fsimage.new to fsimage. There is no need to
        // update the fstime file here. renameTo fails on Windows
        // if the destination file already exists.
        //
        if (!ckptFile.renameTo(curFile)) {
          if (!curFile.delete())
            LOG.warn("Unable to delete dir " + curFile + " before rename");
          if (!ckptFile.renameTo(curFile)) {
            throw new IOException("Unable to rename " + ckptFile +
                                  " to " + curFile);
          }
        }
      }
    }
    return needToSave;
  }

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/a17639e9466aa376e9151b31b774c0a8.html