李宏毅2022机器学习HW3 Image Classification

Homework3

数据集下载

在本地环境下进行实验总是令人安心,但是又苦于网上找不到数据集,虽然kaggle上有数据集但是下载存在问题

于是有了一个天才的想法,间接从kaggle上下载(利用output文件夹中的文件是可下载这一机制将数据集从input文件夹拷贝到output文件夹),具体操作如下图

等待数据集拷贝到output后,点击输出的蓝色链接即可下载。

相关代码由下给出

python 复制代码
!python -m zipfile -c /kaggle/working/Dataset.zip /kaggle/input/ml2022spring-hw4/Dataset # copy数据集到output文件夹,此过程可能较慢

import os
os.chdir('/kaggle/working')
print(os.getcwd())
print(os.listdir("/kaggle/working"))
from IPython.display import FileLink
FileLink('mycode.zip')

任务要求

Task1 模型选择

这里对改进sample code中的模型以及引用其他模型进行了尝试,但效果不佳。

如果你想对sample code中的模型进行改进,直接在类Classifier中的self.cnn以及self.fc模块中进行修改即可

如果你想引入其他的模型,可以参考下面的代码

python 复制代码
import torchvision.models as models
alexNet = models.alexnet(weights=None, num_classes=11)
# model = Classifier().to(device)
model = alexNet.to(device)

PS.至于视频中提到的pretrained问题,这个参数以及被废除,使用参数weights代替,而且这里类别数与原模型的类别数不一致,如果使用原模型参数会报错。

Task2 数据增强

指出原代码问题

__getitem__中

python 复制代码
label = int(fname.split("/")[-1].split("_")[0])

应改为

python 复制代码
label = int(fname.split("/")[-1].split("_")[0].split("\\")[-1])

训练数据增强

这里我理解助教的意思应该是对每一个训练数据进行多种transform转换最后仍为一个样本,因此我这里的转换代码如下:

python 复制代码
transform1 = transforms.RandomHorizontalFlip()
transform2 = transforms.RandomRotation(30)
transform3 = transforms.ColorJitter(brightness=0.5)
transform4 = transforms.RandomAffine(degrees=20, translate=(0.2, 0.2), scale=(0.7, 1.3))
train_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize((128, 128)),
    
    # You may add some transforms here.
    transforms.RandomChoice([transform1, transform2, transform3, transform4]), # 对每个样本随意挑选一种转换
    
    # ToTensor() should be the last one of the transforms.
    transforms.ToTensor(),
])
trans_size = 4

测试数据增强 & Ensemble

这里我理解助教的意思是对一个测试样本进行多种不同的变换得到不同的测试样本进行预测,通过投票或其他方式决定这个测试样本的标签

如上面所说,在transforms.Compose中进行多种组合最后得到的仍为一个样本,因此不能在这里进行操作,这里选择在__getitem__中进行修改

python 复制代码
except:
            label = -1 # test has no label
            # multiple prediction for testing samples
            trans_im1 = train_tfm(im)
            trans_im2 = train_tfm(im)
            trans_im3 = train_tfm(im)
            trans_im4 = train_tfm(im)
            trans_im = torch.stack((trans_im, trans_im1, trans_im2, trans_im3, trans_im4))

这样我们就可以得到原样本以及变换后的四个样本,在测试时,依次取出进行预测并选择合适的机制得到标签即可。

这里我们选择助教说的第一种方式做ensemble

model_best = Classifier().to(device)
model_best.load_state_dict(torch.load(f"{_exp_name}_best.ckpt"))
model_best.eval()
prediction = []
with torch.no_grad():
    for data,_ in test_loader:

        # multiple prediction
        data = data.to(device)
        # original test batch sample
        test_pred = model_best(data[:, 0, :, :, :])
        test_pred1 = model_best(data[:, 1, :, :, :])
        test_pred2 = model_best(data[:, 2, :, :, :])
        test_pred3 = model_best(data[:, 3, :, :, :])
        test_pred4 = model_best(data[:, 4, :, :, :])

        test_pred = test_pred *0.65 + 0.35*(test_pred1 + test_pred2 + test_pred3 + test_pred4)/4.0

        test_label = np.argmax(test_pred.cpu().data.numpy(), axis=1)
        prediction += test_label.squeeze().tolist()

Mixup数据增强

这里助教所说应该是对训练数据进行Mixup增强,但是效果也是不太行。

下面给出__getitem__完整代码,这里面对train samples进行Mixup增强,对test samples进行了多变换增强。

python 复制代码
def __getitem__(self,idx):

        # original image procession
        fname = self.files[idx]
        im = Image.open(fname)

        # original image transform
        trans_im = self.transform(im)


        #
        # im = self.data[idx]
        try:
            # label = int(fname.split("/")[-1].split("_")[0])
            label = int(fname.split("/")[-1].split("_")[0].split("\\")[-1])
            # mixup augmentation for train samples
            fname1 = self.files[random.randint(0, self.__len__() - 1)]
            im1 = Image.open(fname1)
            trans_mix_im1 = self.transform(im1)
            label1 = int(fname1.split("/")[-1].split("_")[0].split("\\")[-1])

            trans_im = 0.5*trans_im + 0.5*trans_mix_im1

            label = [label, label1]
            
        except:
            label = -1 # test has no label
            # multiple prediction for testing samples
            trans_im1 = train_tfm(im)
            trans_im2 = train_tfm(im)
            trans_im3 = train_tfm(im)
            trans_im4 = train_tfm(im)
            trans_im = torch.stack((trans_im, trans_im1, trans_im2, trans_im3, trans_im4))
            
        return trans_im,label

loss修改

python 复制代码
loss = criterion(logits, labels[0].to(device))
loss1 = criterion(logits, labels[1].to(device))
loss = loss + loss1
准确率acc计算方式修改 复制代码
def mixup_accuracy(output, target1, target2):
    """
    计算 Mixup 样本的准确率
    :param output: 模型的输出,形状为 (batch_size, num_classes)
    :param target1: 第一个样本的标签,形状为 (batch_size,)
    :param target2: 第二个样本的标签,形状为 (batch_size,)
    :return: 准确率
    """
    # 计算模型对混合样本的预测结果
    output = output.to(device)
    target1 = target1.to(device)
    target2 = target2.to(device)
    
    _, pred_indices = output.topk(2, dim=-1)
    # 取出预测结果中最大的两个值对应的索引
    pred1_indices = pred_indices[:, 0]  # 第一个最大值的索引
    pred2_indices = pred_indices[:, 1]  # 第二个最大值的索引
    # 计算混合样本的准确率
    acc1 = (((pred1_indices == target1) | (pred1_indices == target2))).float()
    acc2 = (((pred2_indices == target1) | (pred2_indices == target2))).float()
    acc = (acc1 + acc2 == 2.0).float().mean()
    return acc
###Task3 交叉验证
ChatGpt告诉我这么做
```python
k = 4
kf = KFold(n_splits=k, shuffle=True, random_state=42)
for fold, (train_idx, valid_idx) in enumerate(kf.split(dataset)):
    train_set = Subset(dataset, train_idx)
    valid_set = Subset(dataset, valid_idx)

    # Create data loaders for training and validation sets
    train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, pin_memory=True)
    valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=False, pin_memory=True)

最终结果

这一套组合拳下来,结果糟糕透了,不知道哪里出了问题!