热备份是一种在线备份方式。对hdfs的操作会立即对内存中的文件树进行操作,但不会立即保存到NN本地的fsimage,会将此操作记录FSEditLog.logEdit到NN的edits日志文件中。SNN定时调用SecondaryNameNode.doCheckpoint()方法进行热备份:让NN停止记录文件树操作到edits中,而是将文件树操作记录到新建的edits.new中,从NN中获取fsimage和edits这两个文件,先将fsimage读入到SNN的内存中,然后逐条读取edits上的记录。读取到这条记录后就对SNN中的文件树进行操作FSEditLog.loadFSEdits。读完edits并操作完之后,将SNN中的文件树保存成fsimage.ckpt。然后通过url上传到NN,SNN通过RPC框架调用NN的rollFSImage将edits.new转成edits(此时edits.new可能就已经记录了热备份期间的一些文件树操作)、将fsimage.ckpt转成fsimage。
SNN每fs.checkpoint.period(默认为1小时)对NN上进行checkpoint。
void doCheckpoint() throws IOException {
// Do the required initialization of the merge work area.
startCheckpoint();//开始checkpoint之前将SNN下的current重命名为lastcheckpoint.tmp,重新创建一个current进行
// Tell the namenode to start logging transactions in a new edit file
// Retuns a token that would be used to upload the merged image.
CheckpointSignature sig = (CheckpointSignature)namenode.rollEditLog();//让NN暂停使用edits,暂时使用edits.new
// error simulation code for junit test
if (ErrorSimulator.getErrorSimulation(0)) {
throw new IOException("Simulating error0 " +
"after creating edits.new");
}
downloadCheckpointFiles(sig);// 从NN上下载fsimage and edits到SNN
doMerge(sig); //对下载好的两种文件进行合并,合并成fsimage.ckpt
//
// Upload the new image into the NameNode. Then tell the Namenode
// to make this new uploaded image as the most current image.
//
putFSImage(sig);//将合并好的fsimage.ckpt发送给NN
// error simulation code for junit test
if (ErrorSimulator.getErrorSimulation(1)) {
throw new IOException("Simulating error1 " +
"after uploading new image to NameNode");
}
namenode.rollFsImage();//NN将fsimage.ckpt roll成fsimage,将edits.new roll成edits
checkpointImage.endCheckpoint();//checkpoint结束
LOG.warn("Checkpoint done. New Image Size: "
+ checkpointImage.getFsImageName().length());
}