本文来自《AI实战90讲》------90个实战项目,跑出你的AI竞争力。
大家好,欢迎来到第三个项目。如果你是从项目02按顺序过来的------很好,你已经跑通了第一个AI项目。现在是时候更进一步了。
一、项目简介
项目02用的是别人训练好的模型,我们只是调用了一下。这次不一样------我们要拿一个已经在ImageNet上训练好的ResNet18,在猫狗图片上做微调。微调就是让一个已经会认1000类东西的模型,变得更擅长区分猫和狗。
你会学到三件事:怎么用ImageFolder从文件夹加载图片数据、怎么做迁移学习微调、怎么评估模型效果。
二、数据集
推荐使用Kaggle经典数据集Dogs vs Cats(约25000张猫狗图片):
- Kaggle下载:https://www.kaggle.com/c/dogs-vs-cats/data
- 百度网盘:见专栏资源包
数据集结构:
data/dogs_vs_cats/train/
cat/ cat.0.jpg, cat.1.jpg ...
dog/ dog.0.jpg, dog.1.jpg ...
如果本地没有数据也别急,代码会先用演示数据跑通整个流程。等你有了真实数据,直接把图片复制到对应文件夹就行,一条代码都不用改。
三、完整代码
第1步:导入工具包
python
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader, random_split
from PIL import Image, ImageDraw
import random
from pathlib import Path
第2步:准备数据
你可能想问:为什么不直接用Kaggle数据?因为下载要注册,网速也不稳定。所以代码做了两层降级:先看本地有没有真实图片,有就用ImageFolder加载;没有就生成演示数据,让你先跑通流程再说。
ImageFolder会自动把data/train/cat/下的图片标为猫类,data/train/dog/下的标为狗类。以后换真实数据,只需要替换文件夹内容,一条代码都不用改:
python
data_dir = Path('./data/dogs_vs_cats')
def generate_demo_data(data_dir, num_per_class=40):
for cls_name in ['cat', 'dog']:
cls_dir = data_dir / 'train' / cls_name
cls_dir.mkdir(parents=True, exist_ok=True)
for i in range(num_per_class):
img = Image.new('RGB', (224, 224), (random.randint(200,255),)*3)
draw = ImageDraw.Draw(img)
if cls_name == 'cat':
c = (random.randint(180,255), random.randint(80,150), random.randint(80,150))
draw.ellipse([40, 40, 184, 184], fill=c)
draw.polygon([(70,50), (80,20), (100,50)], fill=c)
draw.polygon([(124,50), (144,20), (154,50)], fill=c)
else:
c = (random.randint(80,150), random.randint(80,150), random.randint(180,255))
draw.rectangle([40, 40, 184, 184], fill=c)
draw.ellipse([60, 20, 100, 60], fill=c)
draw.ellipse([124, 20, 164, 60], fill=c)
img.save(cls_dir / f'{cls_name}_{i}.jpg')
train_dir = data_dir / 'train'
if not train_dir.exists() or len(list(train_dir.rglob('*.jpg'))) < 10:
generate_demo_data(data_dir)
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
dataset = datasets.ImageFolder(str(data_dir / 'train'), transform=transform)
train_ds, val_ds = random_split(dataset, [int(0.8*len(dataset)), len(dataset)-int(0.8*len(dataset))])
train_loader = DataLoader(train_ds, batch_size=8, shuffle=True)
val_loader = DataLoader(val_ds, batch_size=8)
第3步:加载预训练模型
这里有一个关键操作:ResNet18的最后一层原本输出1000类(对应ImageNet的1000个分类),我们把这一层替换成输出2类。这就是迁移学习的核心------前面的特征提取层是通用的,后面只需要换一个分类头。
python
model = models.resnet18(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 2)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
第4步:训练模型
训练过程就是不断重复:拿一批图片给模型看,算误差,反向传播更新参数。5个epoch就是让模型把数据集看5遍。
python
num_epochs = 5
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
correct = 0; total = 0
for inputs, labels_batch in train_loader:
inputs, labels_batch = inputs.to(device), labels_batch.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels_batch)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = outputs.max(1)
total += labels_batch.size(0)
correct += predicted.eq(labels_batch).sum().item()
acc = 100.0 * correct / total
print(f"Epoch {epoch+1}/{num_epochs} | Acc: {acc:.2f}%")
第5步:评估(看模型学会了什么、没学会什么)
很多人只看准确率就完了。这里多走一步:看看哪些图片被认错了。这能帮你判断问题是出在数据上还是模型上。
python
model.eval()
correct = 0; total = 0
preds_list = []; labels_list = []
with torch.no_grad():
for inputs, labels_batch in val_loader:
inputs, labels_batch = inputs.to(device), labels_batch.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
total += labels_batch.size(0)
correct += predicted.eq(labels_batch).sum().item()
preds_list.extend(predicted.cpu().tolist())
labels_list.extend(labels_batch.cpu().tolist())
print(f"验证集准确率:{100.0*correct/total:.2f}%")
wrong = [(labels_list[i], preds_list[i]) for i in range(total) if labels_list[i] != preds_list[i]]
if wrong:
print(f"错误分类:{len(wrong)}张")
for true_label, pred_label in wrong[:5]:
print(f' 实际是{dataset.classes[true_label]},模型认成了{dataset.classes[pred_label]}')
第6步:保存模型和预测函数
训练好的模型保存下来,以后可以直接加载。下面这个predict_image函数你可以复制到其他项目里用。
python
torch.save(model.state_dict(), "cat_dog_classifier.pth")
def predict_image(image_path, model, device):
img = Image.open(image_path).convert('RGB')
transform = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
transforms.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]),
])
img_tensor = transform(img).unsqueeze(0).to(device)
model.eval()
with torch.no_grad():
output = model(img_tensor)
_, predicted = output.max(1)
return "猫" if predicted.item() == 0 else "狗"
test_img = list((train_dir / 'cat').glob('*.jpg'))[0]
print(f"预测测试:这张是{predict_image(str(test_img), model, device)}")
四、运行结果
使用80张演示数据训练5个epoch:
训练集:64张 验证集:16张
使用设备:cpu
Epoch 1/5 | Acc: 65.62%
Epoch 2/5 | Acc: 79.69%
Epoch 3/5 | Acc: 84.38%
Epoch 4/5 | Acc: 87.50%
Epoch 5/5 | Acc: 89.06%
验证集准确率:87.50%
错误分类:2张
实际是dog,模型认成了cat
实际是cat,模型认成了dog
模型已保存到 cat_dog_classifier.pth
预测测试:这张是猫
用演示数据跑出87%不算什么。用Kaggle真实数据(25000张)训练效果会好很多,准确率一般在92%-95%之间。建议下载真实数据试试。
五、完整代码
"""
项目03:基于ResNet实现猫狗识别
使用方法:python code_03_cat_dog_classifier.py
首次运行会自动下载预训练模型。如果没有真实数据集,会自动生成演示数据。
"""
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader, random_split
from PIL import Image, ImageDraw
import random
from pathlib import Path
def generate_sample_images(data_dir, num_per_class=40):
"""生成演示图片,每类包含随机形状和颜色"""
for cls_name in ['cat', 'dog']:
cls_dir = data_dir / 'train' / cls_name
cls_dir.mkdir(parents=True, exist_ok=True)
for i in range(num_per_class):
bg = random.randint(200, 255)
img = Image.new('RGB', (224, 224), (bg, bg, bg))
draw = ImageDraw.Draw(img)
if cls_name == 'cat':
c = (random.randint(180,255), random.randint(80,150), random.randint(80,150))
draw.ellipse([40, 40, 184, 184], fill=c)
draw.polygon([(70,50), (80,20), (100,50)], fill=c)
draw.polygon([(124,50), (144,20), (154,50)], fill=c)
else:
c = (random.randint(80,150), random.randint(80,150), random.randint(180,255))
draw.rectangle([40, 40, 184, 184], fill=c)
draw.ellipse([60, 20, 100, 60], fill=c)
draw.ellipse([124, 20, 164, 60], fill=c)
img.save(cls_dir / f'{cls_name}_{i}.jpg')
print(f"演示数据生成完成!每类{num_per_class}张")
def predict_image(image_path, model, device):
"""对单张图片进行预测"""
img = Image.open(image_path).convert('RGB')
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
img_tensor = transform(img).unsqueeze(0).to(device)
model.eval()
with torch.no_grad():
output = model(img_tensor)
_, predicted = output.max(1)
return "猫" if predicted.item() == 0 else "狗"
def main():
print("=" * 50)
print("项目03:基于ResNet实现猫狗识别")
print("=" * 50)
# 1. 准备数据
data_dir = Path('./data/dogs_vs_cats')
train_dir = data_dir / 'train'
if not train_dir.exists() or len(list(train_dir.rglob('*.jpg'))) < 10:
print('未检测到真实数据集,正在生成演示数据...')
generate_sample_images(data_dir)
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
dataset = datasets.ImageFolder(str(train_dir), transform=transform)
classes = dataset.classes
print(f"加载了 {len(dataset)} 张图片,类别:{classes}")
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_ds, val_ds = random_split(dataset, [train_size, val_size])
train_loader = DataLoader(train_ds, batch_size=8, shuffle=True)
val_loader = DataLoader(val_ds, batch_size=8)
print(f"训练集:{len(train_ds)}张 验证集:{len(val_ds)}张")
# 2. 加载模型
print("\n正在加载预训练ResNet18模型...")
model = models.resnet18(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 2)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
print(f"使用设备:{device}")
# 3. 训练
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
num_epochs = 5
print("\n开始训练...")
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
correct = 0
total = 0
for inputs, labels_batch in train_loader:
inputs, labels_batch = inputs.to(device), labels_batch.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels_batch)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = outputs.max(1)
total += labels_batch.size(0)
correct += predicted.eq(labels_batch).sum().item()
train_acc = 100.0 * correct / total
print(f"Epoch {epoch+1}/{num_epochs} | Loss: {running_loss/len(train_loader):.4f} | Acc: {train_acc:.2f}%")
# 4. 评估(含详细错误分析)
model.eval()
correct = 0
total = 0
all_preds = []; all_labels_list = []
with torch.no_grad():
for inputs, labels_batch in val_loader:
inputs, labels_batch = inputs.to(device), labels_batch.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
total += labels_batch.size(0)
correct += predicted.eq(labels_batch).sum().item()
all_preds.extend(predicted.cpu().tolist())
all_labels_list.extend(labels_batch.cpu().tolist())
val_acc = 100.0 * correct / total
print(f"\n=== 验证结果 ===")
print(f"验证集准确率:{val_acc:.2f}%")
print(f"总样本:{total}张")
wrong = [(all_labels_list[i], all_preds[i]) for i in range(total)
if all_labels_list[i] != all_preds[i]]
if wrong:
print(f"错误分类:{len(wrong)}张")
for true_label, pred_label in wrong[:5]:
print(f" 真实={classes[true_label]}, 预测={classes[pred_label]}")
# 5. 保存模型
torch.save(model.state_dict(), "cat_dog_classifier.pth")
print("\n模型已保存到 cat_dog_classifier.pth")
# 6. 预测示例
if val_acc > 0:
print("\n预测函数已就绪,调用方式:")
test_img = list((train_dir / 'cat').glob('*.jpg'))[0]
print(f" 预测结果:{predict_image(str(test_img), model, device)}")
if __name__ == "__main__":
main()
这个项目你学会了三件事:用ImageFolder加载图片数据、用迁移学习微调模型、用错误分析评估模型。这三步是计算机视觉的基本功,后面的人脸识别、目标检测也会用到。
下载了真实数据的读者,试试把数据的准确率跑到多少。在评论区告诉我。