之前的文章里https://www.cnblogs.com/sdu20112013/p/11099244.html实现了网络的各个layer.
本篇来实现网络的forward的过程.
forward函数继承自nn.Module
Convolutional and Upsample Layers if module_type == "convolutional" or module_type == "upsample": x = self.module_list[i](x) Route Layer / Shortcut Layer在上一篇里讲过了,route layer的输出是之前某一层或某两层在depth方向的连接.即
output[current_layer] = output[previous_layer] 或者 map1 = outputs[i + layers[0]] map2 = outputs[i + layers[1]] output[current layer]=torch.cat((map1, map2), 1)所以route layer代码如下:
elif module_type == "route": layers = module["layers"] layers = [int(a) for a in layers] if (layers[0]) > 0: layers[0] = layers[0] - i if len(layers) == 1: x = outputs[i + (layers[0])] else: if (layers[1]) > 0: layers[1] = layers[1] - i map1 = outputs[i + layers[0]] map2 = outputs[i + layers[1]] x = torch.cat((map1, map2), 1)shortcut layer的输出为前一层及前xx层(配置文件中配置)的输出之和
elif module_type == "shortcut": from_ = int(module["from"]) x = outputs[i-1] + outputs[i+from_] YOLO layeryolo层的输出是一个n*n*depth的feature map矩阵.假设你想访问第(5,6)个cell的第2个boundingbox的话你需要map[5,6,(5+C):2*(5+C)]这样访问,这种形式操作起来有点麻烦,所以我们引入一个predict_transform函数来改变一下输出的形式.
简而言之我们希望把一个batch_size*grid_size*grid_size*(B*(5+C))的4-D矩阵转换为batch_size*(grid_size*grid_size*B)*(5+C)的矩阵.
2-D矩阵的每一行的排列如下:
上述代码涉及到pytorch中view的用法,和numpy中resize类似.contiguous一般与transpose,permute,view搭配使用,维度变换后tensor在内存中不再是连续存储的,而view操作要求连续存储,所以需要contiguous.最终我们得到一个batch_size*(grid_size*grid_size*num_anchors)*bbox_attrs的矩阵.
接下来要对预测boundingbox的坐标.
注意此时prediction[:,:,0],prediction[:,:,1],prediction[:,:,2],prediction[:,:,3]prediction[:,:,4]即相应的tx,ty,tw,th,obj score.
接下来是预测相对当前cell左上角的offset
有关meshgrid用法效果如下:
import numpy as np import torch grid_size = 13 grid = np.arange(grid_size) a,b = np.meshgrid(grid, grid) print(a) print(b) x_offset = torch.FloatTensor(a).view(-1,1) #print(x_offset) y_offset = torch.FloatTensor(b).view(-1,1)这段代码输出如下:
预测boundingbox的width,height.注意anchors的大小要转换为适配当前feature map的大小.配置文件中配置的是相对于模型输入的大小.
anchors = [(a[0]/stride, a[1]/stride) for a in anchors] #适配到feature map上的尺寸 #log space transform height and the width anchors = torch.FloatTensor(anchors) if CUDA: anchors = anchors.cuda() anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0) prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors ##还原为原始图片上对应的坐标 prediction[:,:,:4] *= stride预测class probability
prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))