pytorch实现yolov3(3) 实现forward (2)

predict_transform完整代码如下

#yolo经过不断地卷积得到的feature map size= batch_size*(B*(5+C))*grid_size*grid_size def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True): if CUDA: prediction = prediction.to(torch.device("cuda")) #使用gpu torch0.4不需要 torch1.0需要 batch_size = prediction.size(0) stride = inp_dim // prediction.size(2) grid_size = inp_dim // stride bbox_attrs = 5 + num_classes num_anchors = len(anchors) print("prediction.shape=",prediction.shape) print("batch_size=",batch_size) print("inp_dim=",inp_dim) #print("anchors=",anchors) #print("num_classes=",num_classes) print("grid_size=",grid_size) print("bbox_attrs=",bbox_attrs) prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size) prediction = prediction.transpose(1,2).contiguous() prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs) #Sigmoid the centre_X, centre_Y. and object confidencce prediction[:,:,0] = torch.sigmoid(prediction[:,:,0]) prediction[:,:,1] = torch.sigmoid(prediction[:,:,1]) prediction[:,:,4] = torch.sigmoid(prediction[:,:,4]) #Add the center offsets grid = np.arange(grid_size).astype(np.float32) a,b = np.meshgrid(grid, grid) x_offset = torch.FloatTensor(a).view(-1,1) y_offset = torch.FloatTensor(b).view(-1,1) if CUDA: x_offset = x_offset.cuda() y_offset = y_offset.cuda() x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0) print(type(x_y_offset),type(prediction[:,:,:2])) prediction[:,:,:2] += x_y_offset anchors = [(a[0]/stride, a[1]/stride) for a in anchors] #适配到和feature map大小匹配 #log space transform height and the width anchors = torch.FloatTensor(anchors) if CUDA: anchors = anchors.cuda() anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0) prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes])) prediction[:,:,:4] *= stride #恢复到原始图片上的相应坐标,width,height等 return prediction

助手函数写好了,现在来继续实现Darknet类的forward方法

elif module_type == "yolo": anchors = self.module_list[i][0].anchors inp_dim = int(self.net_info["height"]) num_classes = int (module["classes"]) x = x.data x = predict_transform(x, inp_dim, anchors, num_classes, CUDA) if not write: #if no collector has been intialised. detections = x write = 1 else: detections = torch.cat((detections, x), 1)

在没有写predict_transform之前,不同的feature map矩阵,比如13*13*N1,26*26*N2,52*52*N3是没法直接连接成一个tensor的,现在都变成了xx*(5+C)则可以了.
上面代码里的write flag主要是为了区别detections是否为空,为空则说明是第一个yolo layer做的预测,将yolo层的输出赋值给predictions,不为空则连接当前yolo layer的输出至detections.

测试

下载测试图片wget https://github.com/ayooshkathuria/pytorch-yolo-v3/raw/master/dog-cycle-car.png

def get_test_input(): img = cv2.imread("dog-cycle-car.png") img = cv2.resize(img, (608,608)) #Resize to the input dimension img_ = img[:,:,::-1].transpose((2,0,1)) # BGR -> RGB | H X W C -> C X H X W img_ = img_[np.newaxis,:,:,:]/255.0 #Add a channel at 0 (for batch) | Normalise img_ = torch.from_numpy(img_).float() #Convert to float img_ = Variable(img_) # Convert to Variable return img_ model = Darknet("cfg/yolov3.cfg") inp = get_test_input() pred = model(inp, torch.cuda.is_available()) print (pred)

cv2.imread()导入图片时是BGR通道顺序,并且是h*w*c,比如416*416*3这种格式,我们要转换为3*416*416这种格式.如果有

RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
在predict_transform开头添加prediction = prediction.to(torch.device("cuda")) #使用gpu

RuntimeError: shape '[1, 255, 3025]' is invalid for input of size 689520
注意检查你的input的img的大小和你模型的输入大小是否匹配. 比如模型是608*608的

最终测试结果如下:

pytorch实现yolov3(3) 实现forward

预测出22743个boundingbox,一共3种feature map,分别为19*19,38*38,76*76 每种尺度下预测出3个box,一共3*(19*19 + 38*38 + 76*76) = 22743个box.

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/zyzsww.html