P6周:VGG-16算法-Pytorch实现人脸识别

我的环境

语言环境:Python 3.8.12

编译器:jupyter notebook

深度学习环境:torch 1.12.0+cu113

一、前期准备
1.设置GPU
python 复制代码
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision
from torchvision import transforms, datasets
import os,PIL,pathlib,warnings

warnings.filterwarnings("ignore")             #忽略警告信息

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device
复制代码
device(type='cuda')
2.导入数据
python 复制代码
import os,PIL,random,pathlib

data_dir = 'F:/jupyter lab/DL-100-days/datasets/Hollywood_stars_photos/'
data_dir = pathlib.Path(data_dir)

data_paths  = list(data_dir.glob('*'))
classeNames = [str(path).split("\\")[5] for path in data_paths]
classeNames
复制代码
['Angelina Jolie',
 'Brad Pitt',
 'Denzel Washington',
 'Hugh Jackman',
 'Jennifer Lawrence',
 'Johnny Depp',
 'Kate Winslet',
 'Leonardo DiCaprio',
 'Megan Fox',
 'Natalie Portman',
 'Nicole Kidman',
 'Robert Downey Jr',
 'Sandra Bullock',
 'Scarlett Johansson',
 'Tom Cruise',
 'Tom Hanks',
 'Will Smith']
python 复制代码
# 关于transforms.Compose的更多介绍可以参考:https://blog.csdn.net/qq_38251616/article/details/124878863
train_transforms = transforms.Compose([
    transforms.Resize([224, 224]),  # 将输入图片resize成统一尺寸
    # transforms.RandomHorizontalFlip(), # 随机水平翻转
    transforms.ToTensor(),          # 将PIL Image或numpy.ndarray转换为tensor,并归一化到[0,1]之间
    transforms.Normalize(           # 标准化处理-->转换为标准正太分布(高斯分布),使模型更容易收敛
        mean=[0.485, 0.456, 0.406], 
        std=[0.229, 0.224, 0.225])  # 其中 mean=[0.485,0.456,0.406]与std=[0.229,0.224,0.225] 从数据集中随机抽样计算得到的。
])

total_data = datasets.ImageFolder("F:/jupyter lab/DL-100-days/datasets/Hollywood_stars_photos/",transform=train_transforms)
total_data
复制代码
Dataset ImageFolder
    Number of datapoints: 1800
    Root location: F:/jupyter lab/DL-100-days/datasets/Hollywood_stars_photos/
    StandardTransform
Transform: Compose(
               Resize(size=[224, 224], interpolation=bilinear, max_size=None, antialias=None)
               ToTensor()
               Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
           )
python 复制代码
total_data.class_to_idx
复制代码
{'Angelina Jolie': 0,
 'Brad Pitt': 1,
 'Denzel Washington': 2,
 'Hugh Jackman': 3,
 'Jennifer Lawrence': 4,
 'Johnny Depp': 5,
 'Kate Winslet': 6,
 'Leonardo DiCaprio': 7,
 'Megan Fox': 8,
 'Natalie Portman': 9,
 'Nicole Kidman': 10,
 'Robert Downey Jr': 11,
 'Sandra Bullock': 12,
 'Scarlett Johansson': 13,
 'Tom Cruise': 14,
 'Tom Hanks': 15,
 'Will Smith': 16}
3.划分数据集
python 复制代码
train_size = int(0.8 * len(total_data))
test_size  = len(total_data) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(total_data, [train_size, test_size])
train_dataset, test_dataset
复制代码
(<torch.utils.data.dataset.Subset at 0x187391e5a60>,
 <torch.utils.data.dataset.Subset at 0x187391e5b20>)
python 复制代码
batch_size = 32

train_dl = torch.utils.data.DataLoader(train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True,
                                           num_workers=1)
test_dl = torch.utils.data.DataLoader(test_dataset,
                                          batch_size=batch_size,
                                          shuffle=True,
                                          num_workers=1)
python 复制代码
for X, y in test_dl:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break
复制代码
Shape of X [N, C, H, W]:  torch.Size([32, 3, 224, 224])
Shape of y:  torch.Size([32]) torch.int64

二、调用官方的VGG16模型

python 复制代码
from torchvision.models import vgg16

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))
    
# 加载预训练模型,并且对模型进行微调
model = vgg16(pretrained = True).to(device) # 加载预训练的vgg16模型

for param in model.parameters():
    param.requires_grad = False # 冻结模型的参数,这样子在训练的时候只训练最后一层的参数

# 修改classifier模块的第6层(即:(6): Linear(in_features=4096, out_features=2, bias=True))
# 注意查看我们下方打印出来的模型
model.classifier._modules['6'] = nn.Linear(4096,len(classeNames)) # 修改vgg16模型中最后一层全连接层,输出目标类别个数
model.to(device)  
model
复制代码
Using cuda device
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=17, bias=True)
  )
)

三、训练循环

1.编写训练函数
python 复制代码
# 训练循环
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)  # 训练集的大小
    num_batches = len(dataloader)   # 批次数目, (size/batch_size,向上取整)

    train_loss, train_acc = 0, 0  # 初始化训练损失和正确率
    
    for X, y in dataloader:  # 获取图片及其标签
        X, y = X.to(device), y.to(device)
        
        # 计算预测误差
        pred = model(X)          # 网络输出
        loss = loss_fn(pred, y)  # 计算网络输出和真实值之间的差距,targets为真实值,计算二者差值即为损失
        
        # 反向传播
        optimizer.zero_grad()  # grad属性归零
        loss.backward()        # 反向传播
        optimizer.step()       # 每一步自动更新
        
        # 记录acc与loss
        train_acc  += (pred.argmax(1) == y).type(torch.float).sum().item()
        train_loss += loss.item()
            
    train_acc  /= size
    train_loss /= num_batches

    return train_acc, train_loss
2.编写测试函数
python 复制代码
def test (dataloader, model, loss_fn):
    size        = len(dataloader.dataset)  # 测试集的大小
    num_batches = len(dataloader)          # 批次数目, (size/batch_size,向上取整)
    test_loss, test_acc = 0, 0
    
    # 当不进行训练时,停止梯度更新,节省计算内存消耗
    with torch.no_grad():
        for imgs, target in dataloader:
            imgs, target = imgs.to(device), target.to(device)
            
            # 计算loss
            target_pred = model(imgs)
            loss        = loss_fn(target_pred, target)
            
            test_loss += loss.item()
            test_acc  += (target_pred.argmax(1) == target).type(torch.float).sum().item()

    test_acc  /= size
    test_loss /= num_batches

    return test_acc, test_loss
3.设置动态学习率
python 复制代码
# 调用官方动态学习率接口时使用
learn_rate = 1e-3 # 初始学习率
lambda1 = lambda epoch: 0.92 ** (epoch // 4)
optimizer = torch.optim.SGD(model.parameters(), lr=learn_rate)
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda1) #选定调整方法
4.正式训练
python 复制代码
import copy

loss_fn    = nn.CrossEntropyLoss() # 创建损失函数
epochs     = 40

train_loss = []
train_acc  = []
test_loss  = []
test_acc   = []

best_acc = 0    # 设置一个最佳准确率,作为最佳模型的判别指标

for epoch in range(epochs):
    # 更新学习率(使用自定义学习率时使用)
    # adjust_learning_rate(optimizer, epoch, learn_rate)
    
    model.train()
    epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, optimizer)
    scheduler.step() # 更新学习率(调用官方动态学习率接口时使用)
    
    model.eval()
    epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)
    
    # 保存最佳模型到 best_model
    if epoch_test_acc > best_acc:
        best_acc   = epoch_test_acc
        best_model = copy.deepcopy(model)
    
    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)
    
    # 获取当前的学习率
    lr = optimizer.state_dict()['param_groups'][0]['lr']
    
    template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}, Lr:{:.2E}')
    print(template.format(epoch+1, epoch_train_acc*100, epoch_train_loss, 
                          epoch_test_acc*100, epoch_test_loss, lr))
    
# 保存最佳模型到文件中
PATH = './best_model123.pth'  # 保存的参数文件名
torch.save(model.state_dict(), PATH)

print('Done')
复制代码
Epoch: 1, Train_acc:11.9%, Train_loss:2.810, Test_acc:14.4%, Test_loss:2.639, Lr:1.00E-03
Epoch: 2, Train_acc:17.4%, Train_loss:2.590, Test_acc:15.3%, Test_loss:2.512, Lr:1.00E-03
Epoch: 3, Train_acc:17.8%, Train_loss:2.483, Test_acc:17.5%, Test_loss:2.400, Lr:1.00E-03
Epoch: 4, Train_acc:21.0%, Train_loss:2.409, Test_acc:20.8%, Test_loss:2.344, Lr:9.20E-04
Epoch: 5, Train_acc:22.1%, Train_loss:2.331, Test_acc:23.9%, Test_loss:2.289, Lr:9.20E-04
...........
Epoch:36, Train_acc:42.2%, Train_loss:1.763, Test_acc:38.9%, Test_loss:1.858, Lr:4.72E-04
Epoch:37, Train_acc:43.8%, Train_loss:1.744, Test_acc:39.2%, Test_loss:1.872, Lr:4.72E-04
Epoch:38, Train_acc:44.0%, Train_loss:1.747, Test_acc:39.4%, Test_loss:1.872, Lr:4.72E-04
Epoch:39, Train_acc:43.9%, Train_loss:1.744, Test_acc:39.7%, Test_loss:1.859, Lr:4.72E-04
Epoch:40, Train_acc:44.4%, Train_loss:1.731, Test_acc:39.7%, Test_loss:1.885, Lr:4.34E-04
Done

四、结果可视化

1.Loss与Accuracy图
python 复制代码
import matplotlib.pyplot as plt
#隐藏警告
import warnings
warnings.filterwarnings("ignore")               #忽略警告信息
plt.rcParams['font.sans-serif']    = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False      # 用来正常显示负号
plt.rcParams['figure.dpi']         = 100        #分辨率

from datetime import datetime
current_time = datetime.now() # 获取当前时间

epochs_range = range(epochs)

plt.figure(figsize=(12, 3))
plt.subplot(1, 2, 1)

plt.plot(epochs_range, train_acc, label='Training Accuracy')
plt.plot(epochs_range, test_acc, label='Test Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.xlabel(current_time) # 打卡请带上时间戳,否则代码截图无效

plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label='Training Loss')
plt.plot(epochs_range, test_loss, label='Test Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
2.对指定图片进行预测
python 复制代码
from PIL import Image 

classes = list(total_data.class_to_idx)

def predict_one_image(image_path, model, transform, classes):
    
    test_img = Image.open(image_path).convert('RGB')
    plt.imshow(test_img)  # 展示预测的图片

    test_img = transform(test_img)
    img = test_img.to(device).unsqueeze(0)
    
    model.eval()
    output = model(img)

    _,pred = torch.max(output,1)
    pred_class = classes[pred]
    print(f'预测结果是:{pred_class}')
python 复制代码
# 预测训练集中的某张照片
predict_one_image(image_path='F:/jupyter lab/DL-100-days/datasets/Hollywood_stars_photos/Angelina Jolie/001_fe3347c0.jpg', 
                  model=model, 
                  transform=train_transforms, 
                  classes=classes)
复制代码
预测结果是:Angelina Jolie
3.模型评估
python 复制代码
best_model.eval()
epoch_test_acc, epoch_test_loss = test(test_dl, best_model, loss_fn)
python 复制代码
epoch_test_acc, epoch_test_loss
复制代码
(0.3972222222222222, 1.8640953699747722)
python 复制代码
# 查看是否与我们记录的最高准确率一致
epoch_test_acc
复制代码
0.3972222222222222

五、学习心得

1.本次使用pytorch深度学习环境对官方的VGG-16模型进行调用,并且保存最佳模型权重。VGG-16模型的最大特点是深度。除此之外,其卷积层都采用3x3的卷积核和步长为1的卷积操作,同时卷积层后都有ReLU激活函数,从而降低过拟合风险。

2.训练过程中发现训练和测试的acc都过低(约为20%),通过调整动态学习率予以调整,初始学习率增大一个数量级之后,此问题得到一定的解决。

3.下一步将自行搭建VGG-16模型。

相关推荐
Dfreedom.18 小时前
模型剪枝完全指南:从理论到实践,打造高效深度学习模型
人工智能·算法·机器学习·剪枝·模型加速
开始脱发的自然卷18 小时前
用 Excel 手算 LSTM:从四个门到梯度下降的完整过程
人工智能
BU摆烂会噶18 小时前
【LangGraph】House_Agent 实战(五):持久化、流式输出与部署
人工智能·python·架构·langchain·人机交互
txg66618 小时前
机器人领域简报(2026年5月15日—5月21日)
人工智能·机器人
码上滚雪球18 小时前
Flink Agents 深度解读:当实时数据流遇上 AI 智能体
大数据·人工智能·flink·滚雪球
PNP Robotics18 小时前
PNP机器人亮相南京学术论坛,分享具身智能多模态数据采集前沿成果
人工智能·深度学习·学习·机器学习·virtualenv
少年强则国强18 小时前
安装配置Claude
python
threelab18 小时前
Three.js 银河星系效果 | 三维可视化 / AI 提示词
开发语言·javascript·人工智能
机汇五金_18 小时前
深圳电脑机箱厂家
python
想你依然心痛18 小时前
HarmonyOS 6(API 23)实战:基于悬浮导航、沉浸光感与HMAF的“译界智脑“——PC端AI智能体沉浸式智能翻译与跨语言协作工作台
人工智能·华为·ar·harmonyos