深度学习——手动实现残差网络ResNet 辛普森一家人物识别 (2)

根据实际情况,因为原始模型中输入的文件大小是224 * 224,而我们的图像大小是64 * 64,7*7的内核大小对于这个任务来说太大了。标准ResNet18模型的精度为大约89%,而我修改的模型的准确率大约为94%。

输入层之后是由剩余块组成的四个中间层。ResNet的残余块是由两个33卷积层,包括一个shortcut,使用11卷积层直接添加的输入前一层到另一层的输出。最后,average pooling应用于的输出,将最终的残块和接收到的特征图赋给全连通层。

此外,模型中的卷积结果采用ReLu激活函数归一化处理。

Data Transform:

为了减少过拟合,我使用图像变换进行数据增强。并对输入图像进行归一化处理。这样可以保证所有图像的分布是相似的,即在训练过程中更容易收敛,训练速度更快,效果更好。

我还尝试将图像的大小调整为224224,并在输入层使用77的卷积核,但我发现图像放大得太多,导致特征模糊,模型性能变差。

归一化:我使用下面的脚本来计算所有数据的均值和标准差。

data = torchvision.datasets.ImageFolder(root=student.dataset, transform=student.transform(\'train\')) trainloader = torch.utils.data.DataLoader(data, batch_size=1, shuffle=True) mean = torch.zeros(3) std = torch.zeros(3) for batch in trainloader: images, labels = batch for d in range(3): mean[d] += images[:, d, :, :].mean() std[d] += images[:, d, :, :].std() mean.div_(len(data)) std.div_(len(data)) print(list(mean.numpy()), list(std.numpy())) 超参数及其他设定 Epochs, batch_size, learning rate:

epochs = 120, 如果太小,收敛可能不会结束。
batch_size = 256, 如果batch_size太小,可能会导致收敛速度过慢或损失不会减少
lr = 0.001,当学习率设置过大时,梯度可能会围绕最小值振荡,甚至无法收敛

Loss function:

torch.nn.CrossEntropyLoss() 是很适合图像作业的

Aptimiser:

我尝试过SGD, RMSprop, Adadelta等,但Adam是最适合我的。

Dropout and weight_decay: not use them

当我试图设置它们来减少过拟合问题时,效果并不好。这种设置使loss无法减少或精度降低。

代码

图像增强代码

def transform(mode): """ Called when loading the data. Visit this URL for more information: https://pytorch.org/vision/stable/transforms.html You may specify different transforms for training and testing """ if mode == \'train\': return transforms.Compose([ # transforms.Grayscale(num_output_channels=1), transforms.RandomHorizontalFlip(), transforms.RandomVerticalFlip(), transforms.ToTensor(), transforms.Normalize([0.42988312, 0.42988312, 0.42988312], [0.17416202, 0.17416202, 0.17416202]) ]) elif mode == \'test\': return transforms.Compose([ # transforms.Grayscale(num_output_channels=1), transforms.RandomHorizontalFlip(), transforms.RandomVerticalFlip(), transforms.ToTensor(), transforms.Normalize([0.42988312, 0.42988312, 0.42988312], [0.17416202, 0.17416202, 0.17416202]) ])

ResNet18手工搭建

class Network(nn.Module): def __init__(self, num_classes=14): super().__init__() self.inchannel = 64 self.conv1 = nn.Sequential( nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False), nn.BatchNorm2d(64), nn.ReLU() # nn.MaxPool2d(3, 1, 1) ) self.layer1 = self.make_layer(ResBlock, 64, 2, stride=1) self.layer2 = self.make_layer(ResBlock, 128, 2, stride=2) self.layer3 = self.make_layer(ResBlock, 256, 2, stride=2) self.layer4 = self.make_layer(ResBlock, 512, 2, stride=2) self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Linear(512, num_classes) def make_layer(self, block, channels, num_blocks, stride): layers = [] for i in range(num_blocks): if i == 0: layers.append(block(self.inchannel, channels, stride)) else: layers.append(block(channels, channels, 1)) self.inchannel = channels return nn.Sequential(*layers) def forward(self, x): out = self.conv1(x) out = self.layer1(out) out = self.layer2(out) out = self.layer3(out) out = self.layer4(out) out = self.avgpool(out) out = out.reshape(out.size(0), -1) out = self.fc(out) return out class ResBlock(nn.Module): def __init__(self, inchannel, outchannel, stride=1): super(ResBlock, self).__init__() # two 3*3 kenerl size conv layers self.left = nn.Sequential( nn.Conv2d(inchannel, outchannel, kernel_size=3, stride=stride, padding=1, bias=False), nn.BatchNorm2d(outchannel), nn.ReLU(inplace=True), nn.Conv2d(outchannel, outchannel, kernel_size=3, stride=1, padding=1, bias=False), nn.BatchNorm2d(outchannel) ) self.shortcut = nn.Sequential() if stride != 1 or inchannel != outchannel: # shortcut,1*1 kenerl size # shortcut,这里为了跟2个卷积层的结果结构一致,要做处理 self.shortcut = nn.Sequential( nn.Conv2d(inchannel, outchannel, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(outchannel) ) def forward(self, x): out = self.left(x) # 将2个卷积层的输出跟处理过的x相加,实现ResNet的基本结构 out = out + self.shortcut(x) out = F.relu(out) return out 参考

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/zwwxfz.html