在代码上,语义分割的框架会比目标检测简单很多,但其中也涉及了很多细节。在这篇文章中,我以PSPNet为例,解读一下语义分割框架的代码。搞清楚一个框架后,再看别人的框架都是大同小异。
工程来自https://github.com/speedinghzl/pytorch-segmentation-toolbox
框架中一个非常重要的部分是evaluate.py,即测试阶段。但由于本文篇幅较长,我将另开一篇来阐述测试过程。
整体框架 pytorch-segmentation-toolbox |— dataset 数据集相关 |— list 存放数据集的list |— datasets.py 数据集加载函数 |— libs 存放pytorch的op如bn |— networks 存放网络代码 |— deeplabv3.py |— pspnet.py |— utils 其他函数 |— criterion.py 损失计算 |— encoding.py 显存均匀 |— loss.py OHEM难例挖掘 |— utils.py colormap转换 |— evaluate.py 网络测试 |— run_local.sh 训练脚本 |— train.py 网络训练 train.py网络训练主函数,主要操作有:
传入训练参数;通常采用argparse库,支持脚本传入。
网络训练;包括定义网络、加载模型、前向反向传播、保存模型等。
将训练情况可视化;使用tensorboard绘制loss曲线。
import argparse import torch import torch.nn as nn from torch.utils import data import numpy as np import pickle import cv2 import torch.optim as optim import scipy.misc import torch.backends.cudnn as cudnn import sys import os from tqdm import tqdm import os.path as osp from networks.pspnet import Res_Deeplab from dataset.datasets import CSDataSet import random import timeit import logging from tensorboardX import SummaryWriter from utils.utils import decode_labels, inv_preprocess, decode_predictions from utils.criterion import CriterionDSN, CriterionOhemDSN from utils.encoding import DataParallelModel, DataParallelCriterion torch_ver = torch.__version__[:3] if torch_ver == '0.3': from torch.autograd import Variable start = timeit.default_timer() #由于使用了ImageNet的预训练权重,因此需要在数据预处理过程减去ImageNet上的均值。 IMG_MEAN = np.array((104.00698793,116.66876762,122.67891434), dtype=np.float32) #这些超参数可在sh脚本中定义。 BATCH_SIZE = 8 DATA_DIRECTORY = 'cityscapes' DATA_LIST_PATH = './dataset/list/cityscapes/train.lst' IGNORE_LABEL = 255 INPUT_SIZE = '769,769' LEARNING_RATE = 1e-2 MOMENTUM = 0.9 NUM_CLASSES = 19 NUM_STEPS = 40000 POWER = 0.9 RANDOM_SEED = 1234 RESTORE_FROM = './dataset/MS_DeepLab_resnet_pretrained_init.pth' SAVE_NUM_IMAGES = 2 SAVE_PRED_EVERY = 10000 SNAPSHOT_DIR = './snapshots/' WEIGHT_DECAY = 0.0005 def str2bool(v): if v.lower() in ('yes', 'true', 't', 'y', '1'): return True elif v.lower() in ('no', 'false', 'f', 'n', '0'): return False else: raise argparse.ArgumentTypeError('Boolean value expected.') def get_arguments(): """Parse all the arguments provided from the CLI. Returns: A list of parsed arguments. """ parser = argparse.ArgumentParser(description="DeepLab-ResNet Network") parser.add_argument("--batch-size", type=int, default=BATCH_SIZE, #Batch Size help="Number of images sent to the network in one step.") parser.add_argument("--data-dir", type=str, default=DATA_DIRECTORY, #数据集地址 help="Path to the directory containing the PASCAL VOC dataset.") parser.add_argument("--data-list", type=str, default=DATA_LIST_PATH, #数据集清单 help="Path to the file listing the images in the dataset.") parser.add_argument("--ignore-label", type=int, default=IGNORE_LABEL, #忽略类别(未使用) help="The index of the label to ignore during the training.") parser.add_argument("--input-size", type=str, default=INPUT_SIZE, #输入尺寸 help="Comma-separated string with height and width of images.") parser.add_argument("--is-training", action="store_true", #是否训练 若不传入为false help="Whether to updates the running means and variances during the training.") parser.add_argument("--learning-rate", type=float, default=LEARNING_RATE, #学习率 help="Base learning rate for training with polynomial decay.") parser.add_argument("--momentum", type=float, default=MOMENTUM, #动量系数,用于优化参数 help="Momentum component of the optimiser.") parser.add_argument("--not-restore-last", action="store_true", #是否存储最后一层(未使用) help="Whether to not restore last (FC) layers.") parser.add_argument("--num-classes", type=int, default=NUM_CLASSES, #类别数 help="Number of classes to predict (including background).") parser.add_argument("--start-iters", type=int, default=0, #起始iter数 help="Number of classes to predict (including background).") parser.add_argument("--num-steps", type=int, default=NUM_STEPS, #训练步数 help="Number of training steps.") parser.add_argument("--power", type=float, default=POWER, #power系数,用于更新学习率 help="Decay parameter to compute the learning rate.") parser.add_argument("--random-mirror", action="store_true", #数据增强 翻转 help="Whether to randomly mirror the inputs during the training.") parser.add_argument("--random-scale", action="store_true", #数据增强 多尺度 help="Whether to randomly scale the inputs during the training.") parser.add_argument("--random-seed", type=int, default=RANDOM_SEED, #随机种子 help="Random seed to have reproducible results.") parser.add_argument("--restore-from", type=str, default=RESTORE_FROM, #模型断点续跑 help="Where restore model parameters from.") parser.add_argument("--save-num-images", type=int, default=SAVE_NUM_IMAGES, #保存多少张图片(未使用) help="How many images to save.") parser.add_argument("--save-pred-every", type=int, default=SAVE_PRED_EVERY, #每多少次保存一次断点 help="Save summaries and checkpoint every often.") parser.add_argument("--snapshot-dir", type=str, default=SNAPSHOT_DIR, #模型保存位置 help="Where to save snapshots of the model.") parser.add_argument("--weight-decay", type=float, default=WEIGHT_DECAY, #权重衰减系数,用于正则化 help="Regularisation parameter for L2-loss.") parser.add_argument("--gpu", type=str, default='None', #使用哪些GPU help="choose gpu device.") parser.add_argument("--recurrence", type=int, default=1, #循环次数(未使用) help="choose the number of recurrence.") parser.add_argument("--ft", type=bool, default=False, #微调模型(未使用) help="fine-tune the model with large input size.") parser.add_argument("--ohem", type=str2bool, default='False', #难例挖掘 help="use hard negative mining") parser.add_argument("--ohem-thres", type=float, default=0.6, help="choose the samples with correct probability underthe threshold.") parser.add_argument("--ohem-keep", type=int, default=200000, help="choose the samples with correct probability underthe threshold.") return parser.parse_args() args = get_arguments() #加载参数 #poly学习策略 def lr_poly(base_lr, iter, max_iter, power): return base_lr*((1-float(iter)/max_iter)**(power)) #调整学习率 def adjust_learning_rate(optimizer, i_iter): """Sets the learning rate to the initial LR divided by 5 at 60th, 120th and 160th epochs""" lr = lr_poly(args.learning_rate, i_iter, args.num_steps, args.power) optimizer.param_groups[0]['lr'] = lr return lr #将BN设置为测试状态 def set_bn_eval(m): classname = m.__class__.__name__ if classname.find('BatchNorm') != -1: m.eval() #设置BN动量 def set_bn_momentum(m): classname = m.__class__.__name__ if classname.find('BatchNorm') != -1 or classname.find('InPlaceABN') != -1: m.momentum = 0.0003 #网络训练主函数 def main(): """Create the model and start the training.""" writer = SummaryWriter(args.snapshot_dir) #定义SummaryWriter对象来可视化训练情况。 if not args.gpu == 'None': os.environ["CUDA_VISIBLE_DEVICES"]=args.gpu h, w = map(int, args.input_size.split(',')) #769, 769 input_size = (h, w) #(769, 769) cudnn.enabled = True # Create network. deeplab = Res_Deeplab(num_classes=args.num_classes) #定义网络 print(deeplab) saved_state_dict = torch.load(args.restore_from) #加载模型 saved_state_dict['conv1.weight'] = {Tensor} new_params = deeplab.state_dict().copy() #模态字典,建立层与参数的映射关系 new_params['conv1.weight']={Tensor} for i in saved_state_dict: #剔除预训练模型中的全连接层部分 #Scale.layer5.conv2d_list.3.weight i_parts = i.split('.') #['conv1', 'weight', '2'] # print i_parts # if not i_parts[1]=='layer5': if not i_parts[0]=='fc': new_params['.'.join(i_parts[0:])] = saved_state_dict[i] deeplab.load_state_dict(new_params) #剔除后,加载模态字典,完成模型载入 #deeplab.load_state_dict(torch.load(args.restore_from)) #若无需剔除 model = DataParallelModel(deeplab) #多GPU并行处理 model.train() #设置训练模式,在evaluate.py中是model.eval() model.float() # model.apply(set_bn_momentum) model.cuda() #会将模型加载到0号gpu上并作为主GPU,也可自己指定 #model = model.cuda(device_ids[0]) if args.ohem: #是否采用难例挖掘 criterion = CriterionOhemDSN(thresh=args.ohem_thres, min_kept=args.ohem_keep) else: criterion = CriterionDSN() #CriterionCrossEntropy() criterion = DataParallelCriterion(criterion) #多GPU机器均衡负载 criterion.cuda() #优化器也放在gpu上 cudnn.benchmark = True #可以提升一点训练速度,没有额外开销,一般都会加 if not os.path.exists(args.snapshot_dir): os.makedirs(args.snapshot_dir) #数据加载,该部分见datasets.py trainloader = data.DataLoader(CSDataSet(args.data_dir, args.data_list, max_iters=args.num_steps*args.batch_size, crop_size=input_size, scale=args.random_scale, mirror=args.random_mirror, mean=IMG_MEAN), batch_size=args.batch_size, shuffle=True, num_workers=4, pin_memory=True) #优化器 optimizer = optim.SGD([{'params': filter(lambda p: p.requires_grad, deeplab.parameters()), 'lr': args.learning_rate }], lr=args.learning_rate, momentum=args.momentum,weight_decay=args.weight_decay) optimizer.zero_grad() #清空上一步的残余更新参数值 interp = nn.Upsample(size=input_size, mode='bilinear', align_corners=True) #(未使用) for i_iter, batch in enumerate(trainloader): i_iter += args.start_iters images, labels, _, _ = batch images = images.cuda() labels = labels.long().cuda() if torch_ver == "0.3": images = Variable(images) labels = Variable(labels) optimizer.zero_grad() #清空上一步的残余更新参数值 lr = adjust_learning_rate(optimizer, i_iter) #调整学习率 preds = model(images) #[x, x_dsn] loss = criterion(preds, labels) #计算误差 loss.backward() #误差反向传播 optimizer.step() #更新参数值 #用之前定义的SummaryWriter对象在Tensorboard中绘制lr和loss曲线 if i_iter % 100 == 0: writer.add_scalar('learning_rate', lr, i_iter) writer.add_scalar('loss', loss.data.cpu().numpy(), i_iter) #是否将训练中途的结果可视化 # if i_iter % 5000 == 0: # images_inv = inv_preprocess(images, args.save_num_images, IMG_MEAN) # labels_colors = decode_labels(labels, args.save_num_images, args.num_classes) # if isinstance(preds, list): # preds = preds[0] # preds_colors = decode_predictions(preds, args.save_num_images, args.num_classes) # for index, (img, lab) in enumerate(zip(images_inv, labels_colors)): # writer.add_image('Images/'+str(index), img, i_iter) # writer.add_image('Labels/'+str(index), lab, i_iter) # writer.add_image('preds/'+str(index), preds_colors[index], i_iter) print('iter = {} of {} completed, loss = {}'.format(i_iter, args.num_steps, loss.data.cpu().numpy())) if i_iter >= args.num_steps-1: #保存最终模型 print('save model ...') torch.save(deeplab.state_dict(),osp.join(args.snapshot_dir, 'CS_scenes_'+str(args.num_steps)+'.pth')) break if i_iter % args.save_pred_every == 0: #每隔一定步数保存模型 print('taking snapshot ...') torch.save(deeplab.state_dict(),osp.join(args.snapshot_dir, 'CS_scenes_'+str(i_iter)+'.pth')) #仅保存学习到的参数 #torch.save(deeplab, PATH) #保存整个model及状态 end = timeit.default_timer() print(end-start,'seconds') if __name__ == '__main__': main() datasets.py在pytorch中数据加载到模型的操作顺序如下:
创建一个Dataset对象,一般重载__len__和__getitem__方法。__len__返回数据集大小,__getitem__支持索引,以便Dataset[i]获取第i个样本。
创建一个DataLoader对象,将Dataset作为参数传入。
循环这个DataLoader对象,将img、label加载到模型中进行训练。
这里展示一个简单的例子:
dataset = MyDataset() dataloader = DataLoader(dataset) num_epoches = 100 for epoch in range(num_epoches): for img, label in dataloader:我们还需在Dataset对象中定义数据预处理,这里采用:
0.7-1.4倍的随机尺度缩放
各通道减去ImageNet的均值
随机crop下769x769大小
镜像随机翻转