Pytorch的奇妙体验

Dataset

PyTorch中的Dataset类是一个抽象基类，用于表示数据集。它定义了两个必须实现的方法：__len__() 和 __getitem__()。这个基类是通用的，但它本身无法处理特定类型的数据。因此，当您需要处理特定类型的数据（例如图像、文本等）时，您需要创建一个继承自Dataset类的自定义类，并实现这两个方法，以便根据您的数据加载和处理需求来处理数据。

在您提供的代码示例中，您创建了一个名为ImageDataset的自定义数据集类，它继承自PyTorch的Dataset类。这个类实现了 __len__() 和 __getitem__() 方法，用于处理存储为NumPy格式的图像数据。通过这种方式，您可以使用自定义的数据集类来适应您的特定数据类型和数据处理需求。

总结一下，原因在于PyTorch的Dataset类是一个通用的抽象基类，无法直接处理特n定类型的数据。因此，需要创建自定义数据集类来实现针对特定数据类型的加载和处理。

网络1

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, 5, stride=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(32, 64, 5, stride=1)
        self.fc1 = nn.Linear(64 * 29 * 29, 128)
        self.fc2 = nn.Linear(128, 7)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * 29 * 29)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

效果如下：

Epoch 1 loss: 1.622, train acc: 0.522, test acc: 0.522
Epoch 2 loss: 1.012, train acc: 0.740, test acc: 0.736
Epoch 3 loss: 0.707, train acc: 0.811, test acc: 0.814
Epoch 4 loss: 0.581, train acc: 0.852, test acc: 0.854
Epoch 5 loss: 0.501, train acc: 0.840, test acc: 0.845
Epoch 6 loss: 0.416, train acc: 0.886, test acc: 0.884
Epoch 7 loss: 0.385, train acc: 0.863, test acc: 0.852
Epoch 8 loss: 0.334, train acc: 0.902, test acc: 0.884
Epoch 9 loss: 0.315, train acc: 0.922, test acc: 0.902
Epoch 10 loss: 0.255, train acc: 0.907, test acc: 0.868
Epoch 11 loss: 0.247, train acc: 0.937, test acc: 0.916
Epoch 12 loss: 0.193, train acc: 0.956, test acc: 0.924
Epoch 13 loss: 0.158, train acc: 0.959, test acc: 0.921
Epoch 14 loss: 0.149, train acc: 0.967, test acc: 0.931
Epoch 15 loss: 0.132, train acc: 0.961, test acc: 0.914
Epoch 16 loss: 0.117, train acc: 0.976, test acc: 0.935
Epoch 17 loss: 0.091, train acc: 0.968, test acc: 0.927
Epoch 18 loss: 0.084, train acc: 0.980, test acc: 0.933
Epoch 19 loss: 0.073, train acc: 0.975, test acc: 0.925
Epoch 20 loss: 0.060, train acc: 0.990, test acc: 0.938
Epoch 21 loss: 0.060, train acc: 0.978, test acc: 0.922
Epoch 22 loss: 0.063, train acc: 0.989, test acc: 0.938
Epoch 23 loss: 0.048, train acc: 0.991, test acc: 0.938
Epoch 24 loss: 0.042, train acc: 0.990, test acc: 0.935
Epoch 25 loss: 0.035, train acc: 0.994, test acc: 0.941
Epoch 26 loss: 0.036, train acc: 0.991, test acc: 0.941
Epoch 27 loss: 0.029, train acc: 0.993, test acc: 0.938
Epoch 28 loss: 0.033, train acc: 0.998, test acc: 0.943
Epoch 29 loss: 0.038, train acc: 0.993, test acc: 0.942
Epoch 30 loss: 0.026, train acc: 0.998, test acc: 0.945
Epoch 31 loss: 0.017, train acc: 0.997, test acc: 0.944
Epoch 34 loss: 0.021, train acc: 0.996, test acc: 0.939
Epoch 35 loss: 0.017, train acc: 0.990, test acc: 0.936
Epoch 36 loss: 0.020, train acc: 0.997, test acc: 0.950
Epoch 37 loss: 0.014, train acc: 0.997, test acc: 0.951
Epoch 38 loss: 0.011, train acc: 0.998, test acc: 0.945
Epoch 39 loss: 0.007, train acc: 1.000, test acc: 0.938
Epoch 40 loss: 0.011, train acc: 1.000, test acc: 0.944
Epoch 41 loss: 0.007, train acc: 0.995, test acc: 0.940
Epoch 42 loss: 0.012, train acc: 1.000, test acc: 0.940
Epoch 43 loss: 0.008, train acc: 0.999, test acc: 0.945
Epoch 44 loss: 0.009, train acc: 1.000, test acc: 0.946
Epoch 45 loss: 0.007, train acc: 0.997, test acc: 0.943
Epoch 46 loss: 0.007, train acc: 0.999, test acc: 0.948
Epoch 49 loss: 0.009, train acc: 1.000, test acc: 0.948
Epoch 50 loss: 0.004, train acc: 1.000, test acc: 0.948
Finished Training

这种比较简单的网络结构可能存在一些缺陷：

模型表达能力有限：该网络的深度相对较浅，层数较少，可能无法充分提取输入数据的特征，从而导致模型表达能力不足。
容易出现过拟合：该网络没有使用正则化技术，如dropout等，容易在训练过程中出现过拟合问题，导致模型在测试集上表现不佳。
卷积核尺寸较大：该网络使用的卷积核尺寸为5，可能会导致卷积后的特征图失去一些细节信息，从而降低模型性能。
没有使用预训练模型：该网络是从头开始训练的，没有使用任何预训练模型，可能会导致训练时间较长，模型性能不佳。

以上是该网络可能存在的一些缺陷，可以通过调整网络结构、添加正则化技术、使用更小的卷积核等方式来提高模型性能。

简单提升CNN网络性能

增加了更多卷积层，批量标准化层和 Dropout 层来提高性能：

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.pool = nn.MaxPool2d(2, 2)

        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.conv4 = nn.Conv2d(128, 128, 3, padding=1)
        self.bn4 = nn.BatchNorm2d(128)
  
        self.conv5 = nn.Conv2d(128, 256, 3, padding=1)
        self.bn5 = nn.BatchNorm2d(256)
        self.conv6 = nn.Conv2d(256, 256, 3, padding=1)
        self.bn6 = nn.BatchNorm2d(256)

        self.fc1 = nn.Linear(256 * 8 * 8, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 7)
        self.dropout = nn.Dropout(p=0.5)

    def forward(self, x):
        x = self.pool(F.relu(self.bn1(self.conv1(x))))
        x = self.pool(F.relu(self.bn2(self.conv2(x))))
  
        x = F.relu(self.bn3(self.conv3(x)))
        x = self.pool(F.relu(self.bn4(self.conv4(x))))
  
        x = F.relu(self.bn5(self.conv5(x)))
        x = self.pool(F.relu(self.bn6(self.conv6(x))))
  
        x = x.view(-1, 256 * 8 * 8)
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.fc3(x)
        return x

效果如下：

Epoch 1 loss: 1.625, train acc: 0.466, test acc: 0.465
Epoch 2 loss: 1.146, train acc: 0.707, test acc: 0.712
Epoch 3 loss: 0.603, train acc: 0.895, test acc: 0.883
Epoch 4 loss: 0.313, train acc: 0.938, test acc: 0.937
Epoch 5 loss: 0.175, train acc: 0.963, test acc: 0.951
Epoch 6 loss: 0.135, train acc: 0.970, test acc: 0.956
Epoch 7 loss: 0.092, train acc: 0.982, test acc: 0.967
Epoch 8 loss: 0.071, train acc: 0.986, test acc: 0.968
Epoch 9 loss: 0.055, train acc: 0.988, test acc: 0.971
Epoch 10 loss: 0.043, train acc: 0.990, test acc: 0.971
Epoch 11 loss: 0.036, train acc: 0.995, test acc: 0.968
Epoch 12 loss: 0.028, train acc: 0.994, test acc: 0.979
Epoch 13 loss: 0.024, train acc: 0.996, test acc: 0.977
Epoch 14 loss: 0.016, train acc: 0.998, test acc: 0.977
Epoch 15 loss: 0.017, train acc: 0.997, test acc: 0.978
Epoch 16 loss: 0.017, train acc: 0.997, test acc: 0.978
Epoch 17 loss: 0.015, train acc: 0.998, test acc: 0.981
Epoch 18 loss: 0.013, train acc: 0.998, test acc: 0.979
Epoch 19 loss: 0.010, train acc: 0.998, test acc: 0.976
Epoch 20 loss: 0.008, train acc: 0.999, test acc: 0.978
Epoch 21 loss: 0.008, train acc: 0.999, test acc: 0.980
Epoch 22 loss: 0.007, train acc: 0.998, test acc: 0.980
Epoch 23 loss: 0.006, train acc: 1.000, test acc: 0.979
Epoch 24 loss: 0.005, train acc: 0.999, test acc: 0.978
Epoch 25 loss: 0.006, train acc: 0.999, test acc: 0.977
Epoch 26 loss: 0.005, train acc: 1.000, test acc: 0.979
Epoch 27 loss: 0.005, train acc: 1.000, test acc: 0.977
Epoch 28 loss: 0.004, train acc: 0.999, test acc: 0.978
Epoch 29 loss: 0.005, train acc: 0.999, test acc: 0.976
Epoch 30 loss: 0.004, train acc: 0.999, test acc: 0.982
Epoch 31 loss: 0.004, train acc: 1.000, test acc: 0.977
Epoch 32 loss: 0.006, train acc: 0.999, test acc: 0.979
Epoch 33 loss: 0.005, train acc: 1.000, test acc: 0.978
Epoch 34 loss: 0.004, train acc: 0.999, test acc: 0.976
Epoch 35 loss: 0.003, train acc: 1.000, test acc: 0.982
Epoch 36 loss: 0.003, train acc: 1.000, test acc: 0.977
Epoch 37 loss: 0.003, train acc: 1.000, test acc: 0.979
Epoch 38 loss: 0.003, train acc: 1.000, test acc: 0.976
Epoch 39 loss: 0.003, train acc: 1.000, test acc: 0.979
Epoch 40 loss: 0.002, train acc: 1.000, test acc: 0.982
Epoch 41 loss: 0.002, train acc: 1.000, test acc: 0.980

网络2（Resnet）

拟合速度更快，准确率更高的网络是残差网络（ResNet）。

ResNet是由微软提出的深度残差网络，其主要思想是通过引入残差连接来解决网络退化问题，从而允许网络更深更广，并提高了模型准确率和泛化能力。ResNet常用的版本包括ResNet-18、ResNet-34、ResNet-50、ResNet-101和ResNet-152等。

相比于其他深度神经网络，ResNet的优点有：

更快的训练速度：ResNet通过残差连接解决了梯度消失问题，使得网络可以更深更宽，从而能够更好地拟合数据，提高了训练速度。
更好的泛化能力：残差连接允许网络跨层直接传递信息，避免了信息的损失，使得网络可以更好地学习到数据的特征，提高了模型的泛化能力。
更高的准确率：ResNet通过引入残差连接，使得网络可以更深更宽，提高了模型的表达能力，从而能够更好地拟合数据，提高了模型的准确率。

但是，相比于其他深度神经网络，ResNet占用更多的显存，需要更多的计算资源来训练。

class BasicBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        if self.downsample:
            residual = self.downsample(x)
        out += residual
        out = self.relu(out)
        return out


class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=7):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self.make_layer(block, 64, layers[0])
        self.layer2 = self.make_layer(block, 128, layers[1], 2)
        self.layer3 = self.make_layer(block, 256, layers[2], 2)
        self.layer4 = self.make_layer(block, 512, layers[3], 2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)

    def make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )
        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels
        for _ in range(1, blocks):
            layers.append(block(out_channels, out_channels))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

Resnet存在着如下几个问题

网络的深度限制：尽管ResNet的提出解决了深度神经网络的梯度消失问题，但是当网络的深度增加时，ResNet仍然会出现梯度消失和梯度爆炸的问题。这限制了ResNet的深度。
特征重复利用不充分：在ResNet中，残差块中的特征并没有充分地被重复利用。相对于DenseNet，ResNet的特征传递方式是逐级传递，即特征只在当前和下一个块之间传递，而不是在所有块之间传递。
训练时间较长：由于ResNet是一个非常深的网络，所以它的训练时间会比较长，特别是当训练数据集很大时。

Epoch 1 loss: 1.333, train acc: 0.757, test acc: 0.734
Epoch 2 loss: 0.492, train acc: 0.938, test acc: 0.904
Epoch 3 loss: 0.171, train acc: 0.982, test acc: 0.951
Epoch 4 loss: 0.061, train acc: 0.996, test acc: 0.956
Epoch 5 loss: 0.026, train acc: 0.999, test acc: 0.960
Epoch 6 loss: 0.012, train acc: 1.000, test acc: 0.960
Epoch 7 loss: 0.006, train acc: 1.000, test acc: 0.962
Epoch 8 loss: 0.005, train acc: 1.000, test acc: 0.964
Epoch 9 loss: 0.004, train acc: 1.000, test acc: 0.963
Epoch 10 loss: 0.003, train acc: 1.000, test acc: 0.964
Epoch 11 loss: 0.003, train acc: 1.000, test acc: 0.964
Epoch 12 loss: 0.002, train acc: 1.000, test acc: 0.964
Epoch 13 loss: 0.002, train acc: 1.000, test acc: 0.964
Epoch 14 loss: 0.002, train acc: 1.000, test acc: 0.963
Epoch 15 loss: 0.002, train acc: 1.000, test acc: 0.964
Epoch 16 loss: 0.002, train acc: 1.000, test acc: 0.963
Epoch 17 loss: 0.001, train acc: 1.000, test acc: 0.963
Epoch 18 loss: 0.001, train acc: 1.000, test acc: 0.962
Epoch 19 loss: 0.001, train acc: 1.000, test acc: 0.963
Epoch 20 loss: 0.001, train acc: 1.000, test acc: 0.963
Epoch 21 loss: 0.001, train acc: 1.000, test acc: 0.963
Epoch 22 loss: 0.001, train acc: 1.000, test acc: 0.963
Epoch 23 loss: 0.001, train acc: 1.000, test acc: 0.964
Epoch 24 loss: 0.001, train acc: 1.000, test acc: 0.963
Epoch 25 loss: 0.001, train acc: 1.000, test acc: 0.963
Epoch 26 loss: 0.001, train acc: 1.000, test acc: 0.964
Epoch 27 loss: 0.001, train acc: 1.000, test acc: 0.963
Epoch 28 loss: 0.001, train acc: 1.000, test acc: 0.963
Epoch 29 loss: 0.001, train acc: 1.000, test acc: 0.963

网络3（DenseNet）

在DenseNet中，每个层的输出都会被连接到后续所有层的输入中，这使得每个层都可以直接获取到之前所有层的特征图，从而增加了特征重用的程度，避免了特征的浪费。在DenseNet中，特征图之间的连接可以使用张量拼接（concatenate）来实现。

具体地，DenseNet可以由多个密集块（Dense Block）和一个全局池化层（Global Pooling Layer）组成。每个密集块由多个卷积层和一个批量归一化层（Batch Normalization Layer）组成，卷积层的输出将被拼接到后续所有卷积层的输入中。全局池化层的输出将被送入一个全连接层和一个Softmax层中进行分类。

DenseNet的优点包括：

特征重用程度高：在DenseNet中，每个层都可以直接获取到之前所有层的特征图，从而增加了特征重用的程度，避免了特征的浪费。
模型参数较少：在DenseNet中，由于特征图之间的连接可以使用张量拼接来实现，所以模型参数较少。
准确率高：DenseNet在图像分类等任务上表现出色，达到了当时最好的性能。

然而，DenseNet也有一些缺点，如模型计算量较大、模型结构复杂等。

class DenseLayer(nn.Module):
    def __init__(self, in_channels, growth_rate):
        super(DenseLayer, self).__init__()
        self.bn1 = nn.BatchNorm2d(in_channels)
        self.conv1 = nn.Conv2d(in_channels, growth_rate * 4, kernel_size=1, stride=1, bias=False)
        self.bn2 = nn.BatchNorm2d(growth_rate * 4)
        self.conv2 = nn.Conv2d(growth_rate * 4, growth_rate, kernel_size=3, stride=1, padding=1, bias=False)

    def forward(self, x):
        out = self.bn1(x)
        out = F.relu(out)
        out = self.conv1(out)
        out = self.bn2(out)
        out = F.relu(out)
        out = self.conv2(out)
        out = torch.cat((x, out), 1)
        return out


class DenseBlock(nn.Module):
    def __init__(self, in_channels, growth_rate, num_layers):
        super(DenseBlock, self).__init__()
        self.layers = nn.ModuleList([DenseLayer(in_channels + i * growth_rate, growth_rate) for i in range(num_layers)])

    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x


class TransitionLayer(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(TransitionLayer, self).__init__()
        self.bn = nn.BatchNorm2d(in_channels)
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False)

    def forward(self, x):
        out = self.bn(x)
        out = F.relu(out)
        out = self.conv(out)
        out = F.avg_pool2d(out, 2)
        return out


class DenseNet(nn.Module):
    def __init__(self, growth_rate, block_config, num_classes=7):
        super(DenseNet, self).__init__()
        self.conv1 = nn.Conv2d(3, growth_rate * 2, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(growth_rate * 2)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        in_channels = growth_rate * 2
        self.dense_blocks = nn.ModuleList()
        self.transition_layers = nn.ModuleList()
        for i, num_layers in enumerate(block_config):
            dense_block = DenseBlock(in_channels, growth_rate, num_layers)
            self.dense_blocks.append(dense_block)
            in_channels += num_layers * growth_rate
            if i != len(block_config) - 1:
                transition_layer = TransitionLayer(in_channels, in_channels // 2)
                self.transition_layers.append(transition_layer)
                in_channels = in_channels // 2

        self.bn2 = nn.BatchNorm2d(in_channels)
        self.fc = nn.Linear(in_channels, num_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        for i, dense_block in enumerate(self.dense_blocks):
            x = dense_block(x)
            if i != len(self.dense_blocks) - 1:
                x = self.transition_layers[i](x)

        x = self.bn2(x)
        x = F.relu(x)
        x = F.adaptive_avg_pool2

效果如下；

Epoch 1 loss: 1.401, train acc: 0.684, test acc: 0.669
Epoch 2 loss: 0.556, train acc: 0.945, test acc: 0.921
Epoch 3 loss: 0.188, train acc: 0.984, test acc: 0.951
Epoch 4 loss: 0.074, train acc: 0.997, test acc: 0.956
Epoch 5 loss: 0.024, train acc: 1.000, test acc: 0.965
Epoch 6 loss: 0.012, train acc: 1.000, test acc: 0.966
Epoch 7 loss: 0.007, train acc: 1.000, test acc: 0.964
Epoch 8 loss: 0.005, train acc: 1.000, test acc: 0.965
Epoch 9 loss: 0.004, train acc: 1.000, test acc: 0.967
Epoch 10 loss: 0.003, train acc: 1.000, test acc: 0.966
Epoch 11 loss: 0.003, train acc: 1.000, test acc: 0.965
Epoch 12 loss: 0.003, train acc: 1.000, test acc: 0.966
Epoch 13 loss: 0.002, train acc: 1.000, test acc: 0.967
Epoch 14 loss: 0.002, train acc: 1.000, test acc: 0.967
Epoch 15 loss: 0.002, train acc: 1.000, test acc: 0.966
Epoch 16 loss: 0.002, train acc: 1.000, test acc: 0.967
Epoch 17 loss: 0.002, train acc: 1.000, test acc: 0.966
Epoch 18 loss: 0.001, train acc: 1.000, test acc: 0.966
Epoch 19 loss: 0.001, train acc: 1.000, test acc: 0.966
Epoch 20 loss: 0.001, train acc: 1.000, test acc: 0.966
Epoch 21 loss: 0.001, train acc: 1.000, test acc: 0.966
Epoch 22 loss: 0.001, train acc: 1.000, test acc: 0.966
Epoch 23 loss: 0.001, train acc: 1.000, test acc: 0.966
Epoch 24 loss: 0.001, train acc: 1.000, test acc: 0.966
Epoch 25 loss: 0.001, train acc: 1.000, test acc: 0.966
Epoch 26 loss: 0.001, train acc: 1.000, test acc: 0.967
Epoch 27 loss: 0.001, train acc: 1.000, test acc: 0.967
Epoch 28 loss: 0.001, train acc: 1.000, test acc: 0.966
Epoch 29 loss: 0.001, train acc: 1.000, test acc: 0.966
Epoch 30 loss: 0.001, train acc: 1.000, test acc: 0.967
Epoch 31 loss: 0.001, train acc: 1.000, test acc: 0.967
Epoch 34 loss: 0.001, train acc: 1.000, test acc: 0.967
Epoch 36 loss: 0.001, train acc: 1.000, test acc: 0.968
Epoch 37 loss: 0.001, train acc: 1.000, test acc: 0.967
Epoch 38 loss: 0.001, train acc: 1.000, test acc: 0.967
Epoch 39 loss: 0.001, train acc: 1.000, test acc: 0.967
Epoch 40 loss: 0.001, train acc: 1.000, test acc: 0.968
Epoch 41 loss: 0.001, train acc: 1.000, test acc: 0.968
Epoch 42 loss: 0.000, train acc: 1.000, test acc: 0.968
Epoch 43 loss: 0.000, train acc: 1.000, test acc: 0.967
Epoch 44 loss: 0.000, train acc: 1.000, test acc: 0.968
Epoch 45 loss: 0.000, train acc: 1.000, test acc: 0.968
Epoch 46 loss: 0.000, train acc: 1.000, test acc: 0.967
Epoch 47 loss: 0.000, train acc: 1.000, test acc: 0.968
Epoch 48 loss: 0.000, train acc: 1.000, test acc: 0.968
Epoch 49 loss: 0.000, train acc: 1.000, test acc: 0.968
Epoch 50 loss: 0.000, train acc: 1.000, test acc: 0.968
Finished Training

提高准确度方法：

调整超参数：尝试不同的学习率、批量大小、优化器和权重衰减。可以使用网格搜索或随机搜索找到最佳超参数组合。同时，可以考虑使用学习率调度器逐渐降低学习率。
更深或更宽的模型：尝试使用更复杂的模型，如更深或更宽的 ResNet、DenseNet 或其他现代架构。通常，更复杂的模型具有更大的表示能力，可以提高性能。
数据增强：使用数据增强技术，如随机旋转、翻转、缩放、剪裁和亮度调整等，可以扩展训练数据集并提高模型泛化能力。
正则化：使用正则化技术，如 L1 或 L2 正则化、Dropout 或 Batch Normalization，可以减轻过拟合并提高模型泛化能力。
更多数据：如果可能，尝试收集更多的训练数据。更多的数据有助于模型学习更多的特征，从而提高准确性。
早停法：在验证集上监控模型性能，当性能不再提高时，提前停止训练。这有助于防止过拟合。
预训练模型：使用预训练的模型作为初始模型，然后在您的数据集上进行微调。这样可以利用在大型数据集上学到的特征，加速收敛并提高性能。
集成方法：训练多个模型并将它们的输出结合起来。这可以是简单的平均，或者可以使用更复杂的技术，如投票或模型堆叠。这有助于提高模型的稳定性和准确性。

数据增强

与上文不同，我在加入高斯噪声的基础上加入了图像旋转变换来提高模型的泛化能力

损失函数

要改善模型的训练效果，您可以尝试使用其他损失函数。这里我使用Label Smoothing Cross Entropy损失。可以提高模型的泛化能力，因为它在训练过程中为模型提供了额外的正则化。

class LabelSmoothingCrossEntropy(nn.Module):
    def __init__(self, eps=0.1, reduction='mean'):
        super(LabelSmoothingCrossEntropy, self).__init__()
        self.eps = eps
        self.reduction = reduction

    def forward(self, output, target):
        c = output.size(1)
        log_preds = F.log_softmax(output, dim=1)
        loss = F.nll_loss(log_preds, target, reduction=self.reduction)
        smooth_loss = -log_preds.mean(dim=1)
        if self.reduction == 'mean':
            smooth_loss = smooth_loss.mean()
        elif self.reduction == 'sum':
            smooth_loss = smooth_loss.sum()
        return loss * (1 - self.eps) + smooth_loss * self.eps

BATCH_SIZE

Batch size的选择对模型的训练效果和收敛速度有很大影响。然而，并没有一个固定的答案来确定最佳的batch size。

训练稳定性和收敛速度 ：较大的batch size可以让梯度下降过程更稳定，因为每个batch的平均梯度对噪声更不敏感。然而，大的batch size可能会导致训练过程收敛速度变慢，因为每次迭代更新权重的次数减少了。相反，较小的batch size可以提高训练速度，因为每个epoch内权重更新的次数增加，但可能会导致训练过程不稳定，这是由于小batch size中噪声较多。
泛化能力 ：有研究表明，较小的batch size可能有助于提高模型的泛化能力。这可能是因为较小的batch size在训练过程中引入了随机性和正则化，从而防止模型过拟合。
训练时间 ：较大的batch size可以减少每个epoch所需的迭代次数，从而减少同步和数据传输的开销，提高计算资源的利用率。但是，如果batch size过大，可能会导致GPU内存不足，进而影响训练速度。