配置文件yolov3.cfg定义了网络的结构
.... [convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky [shortcut] from=-3 activation=linear .....配置文件描述了model的结构.
yolov3 layeryolov3有以下几种结构
Convolutional
Shortcut
Upsample
Route
YOLO
Convolutional [convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky Shortcut [shortcut] from=-3 activation=linear类似于resnet,用以加深网络深度.上述配置的含义是shortcut layer的输出是前一层和前三层的输出的叠加.
resnet skip connection解释详细见https://zhuanlan.zhihu.com/p/28124810
通过双线性插值法将N*N的feature map变为(stride*N) * (stride*N)的feature map.模仿特征金字塔,生成多尺度feature map.加强小目标检测效果.
Route [route] layers = -4 [route] layers = -1, 61以上述配置为例:
当layers只有一个值,代表route layer输出的是router layer - 4那一层layer的feature map.
当layers有2个值时,代表route layer的输出为route layer -1和第61 layer的feature map在深度方向连接起来.(比如说3*3*100,3*3*200add起来变成3*3*300)
yolo层负责预测. anchors是9个anchor,事先聚类得到,表示最有可能的anchor形状.
mask表示哪几组anchor被使用.比如mask=0,1,2代表使用10,13 16,30 30,61这几组anchor. 在原理篇里说过了,每个cell预测3个boudingbox. 三种尺度,总计9种.
定义了model的输入,batch等等.
现在开始写代码:
解析配置文件这一步里,做配置文件的解析.把每一块的配置内容存储于一个dict.
def parse_cfg(cfgfile): """ Takes a configuration file Returns a list of blocks. Each blocks describes a block in the neural network to be built. Block is represented as a dictionary in the list """ file = open(cfgfile, 'r') # store the lines in a list lines = file.read().split('\n') # get read of the empty lines lines = [x for x in lines if len(x) > 0] lines = [x for x in lines if x[0] != '#'] # get rid of comments # get rid of fringe whitespaces lines = [x.rstrip().lstrip() for x in lines] block = {} blocks = [] for line in lines: if line[0] == "[": # This marks the start of a new block # If block is not empty, implies it is storing values of previous block. if len(block) != 0: blocks.append(block) # add it the blocks list block = {} # re-init the block block["type"] = line[1:-1].rstrip() else: key, value = line.split("=") block[key.rstrip()] = value.lstrip() blocks.append(block) return blocks 用pytorch创建各个layer逐个layer创建.
def create_modules(blocks): # Captures the information about the input and pre-processing net_info = blocks[0] module_list = nn.ModuleList() prev_filters = 3 #卷积的时候需要知道卷积核的depth.卷积核的size在配置文件里定义了.depeth就是上一层的output的depth. output_filters = [] #用以保存每一个layer的输出的feature map #index代表了当前layer位于网络的第几层 for index, x in enumerate(blocks[1:]): #生成每一个layer module_list.append(module) prev_filters = filters output_filters.append(filters) return(net_info,module_list)卷积层
[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky除了卷积之外实际上还包括了bn和leaky.batchnormalize基本成了标配了现在,用来解决梯度消失的问题(反向传播梯度越乘越小).leaky是激活函数RLU.
所以用到了nn.Sequential()
卷积层创建完整代码
涉及到一个python语法enumerate. 就是为一个list中的每个元素添加一个index,形成新的list.
卷积层创建
#index代表了当前layer位于网络的第几层 for index, x in enumerate(blocks[1:]): module = nn.Sequential() #check the type of block #create a new module for the block #append to module_list if (x["type"] == "convolutional"): #Get the info about the layer activation = x["activation"] try: batch_normalize = int(x["batch_normalize"]) bias = False except: batch_normalize = 0 bias = True filters= int(x["filters"]) padding = int(x["pad"]) kernel_size = int(x["size"]) stride = int(x["stride"]) if padding: pad = (kernel_size - 1) // 2 else: pad = 0 #Add the convolutional layer #prev_filters是上一层输出的feature map的depth.比如上层有64个卷积核,则输出为m*n*64 conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias) module.add_module("conv_{0}".format(index), conv) #Add the Batch Norm Layer if batch_normalize: bn = nn.BatchNorm2d(filters) module.add_module("batch_norm_{0}".format(index), bn) #Check the activation. #It is either Linear or a Leaky ReLU for YOLO if activation == "leaky": activn = nn.LeakyReLU(0.1, inplace = True) module.add_module("leaky_{0}".format(index), activn)