文章目录
- [1. 获取数据](#1. 获取数据)
- [2. 创建Dataset和DataLoader](#2. 创建Dataset和DataLoader)
- [3. 获取并定制预训练模型](#3. 获取并定制预训练模型)
- [4. 训练模型并跟踪结果](#4. 训练模型并跟踪结果)
- [5. 在 TensorBoard 中查看模型的结果](#5. 在 TensorBoard 中查看模型的结果)
- [6. 创建辅助函数来构建 SummaryWriter() 实例](#6. 创建辅助函数来构建 SummaryWriter() 实例)
- [7. 建立一系列建模实验](#7. 建立一系列建模实验)
- [8. 在TensorBoard中查看实验](#8. 在TensorBoard中查看实验)
- [9. 加载最佳模型并用它进行预测](#9. 加载最佳模型并用它进行预测)
- 补充
机器学习和深度学习是非常实验性的,需要跟踪数据、模型架构和训练方案的各种组合的结果。
进行大量不同的实验,实验跟踪可以帮助您找出哪些有效,哪些无效。
只运行少数模型,那么只需在打印输出和一些字典中跟踪它们的结果可能就可以了。但是,随着运行的实验数量开始增加,这种简单的跟踪方式可能会失控。
有多少种实验可以运行,就有多少种不同的方法来跟踪机器学习实验。
Method | Setup | Pros | Cons | Cost |
---|---|---|---|---|
Python 词典、CSV 文件、打印输出 | None | 易于设置,以纯 Python 运行 | 难以跟踪大量实验 | Free |
TensorBoard | Minimal, install tensorboard | PyTorch 内置的扩展被广泛认可和使用,并且可以轻松扩展 | 用户体验不如其他选项 | Free |
Weights & Biases Experiment Tracking | Minimal, install wandb, make an account | 用户体验棒,公开实验,跟踪几乎所有东西 | 需要 PyTorch 之外的外部资源 | Free for personal use |
MLFlow | Minimal, install mlflow and starting tracking | 完全开源的 MLOps 生命周期管理,许多集成 | 与其他服务相比,设置远程跟踪服务器有点困难 | Free |
本篇博客主要介绍使用 TensorBoard 来跟踪我们的实验。
在实验开始前需要进行如下设置:
(1)会重复使用到之前博客 Pytorch模块化里面的python脚本,data_setup.py
和 engine.py
,data_setup
主要是创建Dataset和DataLoader,engine
主要是训练模型的引擎函数,直接从前面的博客copy过来。
(2)导入一些基础库。
(3)设置与设备无关。
(4)创建一个辅助函数来设置种子
python
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision
from torch import nn
from torchvision import transforms
from torchinfo import summary
from going_modular.going_modular import data_setup, engine
python
device = "cuda" if torch.cuda.is_available() else "cpu"
device
python
# Set seeds
def set_seeds(seed: int=42):
"""Sets random sets for torch operations.
Args:
seed (int, optional): Random seed to set. Defaults to 42.
"""
# Set the seed for general torch operations
torch.manual_seed(seed)
# Set the seed for CUDA torch operations (ones that happen on the GPU)
torch.cuda.manual_seed(seed)
上面设置完成后,后面实验跟踪的流程如下:
- 获取数据:FoodVision Mini---披萨、牛排和寿司图像分类数据集。
- 创建Dataset和DataLoader:通过导入的
data_setup
脚本。 - 获取并定制预训练模型:将从
torchvision.models
下载预训练模型,并根据我们自己的问题对其进行自定义。 - 训练模型并跟踪结果
- 在 TensorBoard 中查看模型的结果
- 创建辅助函数来跟踪实验:创建一个函数来帮助我们保存建模实验结果。
- 建立一系列建模实验:编写一些代码来同时运行多个实验,使用不同的模型、不同的数据量和不同的训练时间。
- 在TensorBoard中查看建模实验
- 加载最佳模型并用它进行预测
1. 获取数据
通过下面代码下载好数据集:
python
import os
import zipfile
from pathlib import Path
import requests
def download_data(source: str,
destination: str,
remove_source: bool = True) -> Path:
"""Downloads a zipped dataset from source and unzips to destination.
Args:
source (str): A link to a zipped file containing data.
destination (str): A target directory to unzip data to.
remove_source (bool): Whether to remove the source after downloading and extracting.
Returns:
pathlib.Path to downloaded data.
Example usage:
download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
destination="pizza_steak_sushi")
"""
# Setup path to data folder
data_path = Path("data/")
image_path = data_path / destination
# If the image folder doesn't exist, download it and prepare it...
if image_path.is_dir():
print(f"[INFO] {image_path} directory exists, skipping download.")
else:
print(f"[INFO] Did not find {image_path} directory, creating one...")
image_path.mkdir(parents=True, exist_ok=True)
# Download pizza, steak, sushi data
target_file = Path(source).name
with open(data_path / target_file, "wb") as f:
request = requests.get(source)
print(f"[INFO] Downloading {target_file} from {source}...")
f.write(request.content)
# Unzip pizza, steak, sushi data
with zipfile.ZipFile(data_path / target_file, "r") as zip_ref:
print(f"[INFO] Unzipping {target_file} data...")
zip_ref.extractall(image_path)
# Remove .zip file
if remove_source:
os.remove(data_path / target_file)
return image_path
image_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
destination="pizza_steak_sushi")
image_path
2. 创建Dataset和DataLoader
可以直接调用data_setup.py
里面的create_dataloaders()
函数直接创建Dataset和DataLoader。
在调用之前,我们需要创建一个对数据进行转换的transform
参数,转换后的形式一定需要符合后续模型所需的输入。
由于我们将使用迁移学习和来自 torchvision.models
的专门预训练模型,因此我们将创建一个转换来正确准备我们的图像。
至于这个转换的方式可参考【Pytorch】Transfer Learning 迁移学习里面提到的两种方式,下面选择的是使用手动方式创建:
python
# Setup directories
train_dir = image_path / "train"
test_dir = image_path / "test"
# Setup ImageNet normalization levels (turns all images into similar distribution as ImageNet)
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
# Create transform pipeline manually
manual_transforms = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
normalize
])
print(f"Manually created transforms: {manual_transforms}")
# Create data loaders
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
train_dir=train_dir,
test_dir=test_dir,
transform=manual_transforms, # use manually created transforms
batch_size=32,
num_workers=1
)
train_dataloader, test_dataloader, class_names
补充自动创建方式,选择一种就可以了:
python
# Setup dirs
train_dir = image_path / "train"
test_dir = image_path / "test"
# Setup pretrained weights (plenty of these available in torchvision.models)
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
# Get transforms from weights (these are the transforms that were used to obtain the weights)
automatic_transforms = weights.transforms()
print(f"Automatically created transforms: {automatic_transforms}")
# Create data loaders
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
train_dir=train_dir,
test_dir=test_dir,
transform=automatic_transforms, # use automatic created transforms
batch_size=32,
num_workers=1
)
train_dataloader, test_dataloader, class_names
3. 获取并定制预训练模型
获取预训练模型,冻结基础层并更改分类器头
下载 torchvision.models.efficientnet_b0()
模型的预训练权重,并准备将其与我们自己的数据一起使用。
python
# Note: This is how a pretrained model would be created in torchvision > 0.13, it will be deprecated in future versions.
# model = torchvision.models.efficientnet_b0(pretrained=True).to(device) # OLD
# Download the pretrained weights for EfficientNet_B0
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # NEW in torchvision 0.13, "DEFAULT" means "best weights available"
# Setup the model with the pretrained weights and send it to the target device
model = torchvision.models.efficientnet_b0(weights=weights).to(device)
# View the output of the model
# model
将冻结模型的基础层(我们将使用它们从输入图像中提取特征),并且我们将更改分类器头(输出层)以适应我们正在使用的类的数量(我们有 3 种:披萨、牛排、寿司)。
python
# Freeze all base layers by setting requires_grad attribute to False
for param in model.features.parameters():
param.requires_grad = False
# Since we're creating a new layer with random weights (torch.nn.Linear),
# let's set the seeds
set_seeds()
# Update the classifier head to suit our problem
model.classifier = torch.nn.Sequential(
nn.Dropout(p=0.2, inplace=True),
nn.Linear(in_features=1280,
out_features=len(class_names),
bias=True).to(device))
基础层冻结,分类器头改变,用 torchinfo.summary()
来总结我们的模型:
python
from torchinfo import summary
# # Get a summary of the model (uncomment for full output)
# summary(model,
# input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape" (batch_size, color_channels, height, width)
# verbose=0,
# col_names=["input_size", "output_size", "num_params", "trainable"],
# col_width=20,
# row_settings=["var_names"]
# )
# Print a summary using torchinfo (uncomment for actual output)
summary(model=model,
input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape"
# col_names=["input_size"], # uncomment for smaller output
col_names=["input_size", "output_size", "num_params", "trainable"],
col_width=20,
row_settings=["var_names"]
)
4. 训练模型并跟踪结果
创建损失函数和优化器:
python
# Define loss and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
调整 train() 函数【 engine.py
】以跟踪 SummaryWriter() 的结果:
可以使用 PyTorch 的
torch.utils.tensorboard.SummaryWriter()
类将模型训练进度的各个部分保存到文件中。
默认情况下, SummaryWriter() 类将有关模型的各种信息保存到由log_dir
参数设置的文件中。
log_dir 的默认位置位于 runs/CURRENT_DATETIME_HOSTNAME 下,其中 HOSTNAME 是您的计算机的名称。可以更改跟踪实验的位置(文件名可根据您的需要进行自定义)
SummaryWriter() 的输出以 TensorBoard 格式保存
创建一个默认的 SummaryWriter() 实例:
python
from torch.utils.tensorboard import SummaryWriter
# Create a writer with all default settings
writer = SummaryWriter()
从 engine.py 获取 train() 函数,并将其调整为使用 writer:【为 train() 函数添加记录模型训练和测试损失和准确性值的功能 】
可以使用 writer.add_scalars(main_tag, tag_scalar_dict)
来做到这一点,其中:
main_tag (string)
- 正在跟踪的标量的名称(例如"准确性")tag_scalar_dict (dict)
- 正在跟踪的值的字典(例如{"train_loss": 0.3454}
)
方法称为 add_scalars() 因为我们的损失和准确度值通常是标量(单个值)。
一旦我们完成跟踪值,我们将调用 writer.close()
告诉 writer 停止寻找要跟踪的值。
要开始修改 train()
,我们还将从 engine.py
导入 train_step()
和 test_step()
python
from typing import Dict, List
from tqdm.auto import tqdm
from going_modular.going_modular.engine import train_step, test_step
# Import train() function from:
# https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/going_modular/engine.py
def train(model: torch.nn.Module,
train_dataloader: torch.utils.data.DataLoader,
test_dataloader: torch.utils.data.DataLoader,
optimizer: torch.optim.Optimizer,
loss_fn: torch.nn.Module,
epochs: int,
device: torch.device) -> Dict[str, List]:
"""Trains and tests a PyTorch model.
Passes a target PyTorch models through train_step() and test_step()
functions for a number of epochs, training and testing the model
in the same epoch loop.
Calculates, prints and stores evaluation metrics throughout.
Args:
model: A PyTorch model to be trained and tested.
train_dataloader: A DataLoader instance for the model to be trained on.
test_dataloader: A DataLoader instance for the model to be tested on.
optimizer: A PyTorch optimizer to help minimize the loss function.
loss_fn: A PyTorch loss function to calculate loss on both datasets.
epochs: An integer indicating how many epochs to train for.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A dictionary of training and testing loss as well as training and
testing accuracy metrics. Each metric has a value in a list for
each epoch.
In the form: {train_loss: [...],
train_acc: [...],
test_loss: [...],
test_acc: [...]}
For example if training for epochs=2:
{train_loss: [2.0616, 1.0537],
train_acc: [0.3945, 0.3945],
test_loss: [1.2641, 1.5706],
test_acc: [0.3400, 0.2973]}
"""
# Create empty results dictionary
results = {"train_loss": [],
"train_acc": [],
"test_loss": [],
"test_acc": []
}
# Loop through training and testing steps for a number of epochs
for epoch in tqdm(range(epochs)):
train_loss, train_acc = train_step(model=model,
dataloader=train_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
device=device)
test_loss, test_acc = test_step(model=model,
dataloader=test_dataloader,
loss_fn=loss_fn,
device=device)
# Print out what's happening
print(
f"Epoch: {epoch+1} | "
f"train_loss: {train_loss:.4f} | "
f"train_acc: {train_acc:.4f} | "
f"test_loss: {test_loss:.4f} | "
f"test_acc: {test_acc:.4f}"
)
# Update results dictionary
results["train_loss"].append(train_loss)
results["train_acc"].append(train_acc)
results["test_loss"].append(test_loss)
results["test_acc"].append(test_acc)
### New: Experiment tracking ###
# Add loss results to SummaryWriter
writer.add_scalars(main_tag="Loss",
tag_scalar_dict={"train_loss": train_loss,
"test_loss": test_loss},
global_step=epoch)
# Add accuracy results to SummaryWriter
writer.add_scalars(main_tag="Accuracy",
tag_scalar_dict={"train_acc": train_acc,
"test_acc": test_acc},
global_step=epoch)
# Track the PyTorch model architecture
writer.add_graph(model=model,
# Pass in an example input
input_to_model=torch.randn(32, 3, 224, 224).to(device))
# Close the writer
writer.close()
### End new ###
# Return the filled results at the end of the epochs
return results
测试 5 个 epoch 效果:
python
# Train model
# Note: Not using engine.train() since the original script isn't updated to use writer
set_seeds()
results = train(model=model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
optimizer=optimizer,
loss_fn=loss_fn,
epochs=5,
device=device)
在字典中跟踪模型的结果:
python
# Check out the model results
results
5. 在 TensorBoard 中查看模型的结果
默认情况下, SummaryWriter() 类以 TensorBoard 格式将模型结果存储在名为 runs/ 的目录中。
可以通过多种方式查看 TensorBoard:
在jupyter notebook中,可以这样操作:
确保 TensorBoard 已安装,使用 %load_ext tensorboard
加载它,然后使用 %tensorboard --logdir DIR_WITH_LOGS
查看结果。
python
# Example code to run in Jupyter or Google Colab Notebook (uncomment to try it out)
%load_ext tensorboard
%tensorboard --logdir runs
下面是Colab的显示:
6. 创建辅助函数来构建 SummaryWriter() 实例
SummaryWriter() 类将各种信息记录到 log_dir
参数指定的目录中,创建一个辅助函数来为每个实验创建一个自定义目录。
每个实验都有自己的日志目录,跟踪以下内容:
- Experiment date/timestamp - when did the experiment take place?
- Experiment name - is there something we'd like to call the experiment?
- Model name - what model was used?
- Extra - should anything else be tracked?
开始创建一个名为 create_writer()
的辅助函数,它生成一个 SummaryWriter() 实例跟踪自定义 log_dir
。
理想情况下,我们希望 log_dir 类似于:runs/YYYY-MM-DD/experiment_name/model_name/extra
python
def create_writer(experiment_name: str,
model_name: str,
extra: str=None) -> torch.utils.tensorboard.writer.SummaryWriter():
"""Creates a torch.utils.tensorboard.writer.SummaryWriter() instance saving to a specific log_dir.
log_dir is a combination of runs/timestamp/experiment_name/model_name/extra.
Where timestamp is the current date in YYYY-MM-DD format.
Args:
experiment_name (str): Name of experiment.
model_name (str): Name of model.
extra (str, optional): Anything extra to add to the directory. Defaults to None.
Returns:
torch.utils.tensorboard.writer.SummaryWriter(): Instance of a writer saving to log_dir.
Example usage:
# Create a writer saving to "runs/2022-06-04/data_10_percent/effnetb2/5_epochs/"
writer = create_writer(experiment_name="data_10_percent",
model_name="effnetb2",
extra="5_epochs")
# The above is the same as:
writer = SummaryWriter(log_dir="runs/2022-06-04/data_10_percent/effnetb2/5_epochs/")
"""
from datetime import datetime
import os
# Get timestamp of current date (all experiments on certain day live in same folder)
timestamp = datetime.now().strftime("%Y-%m-%d") # returns current date in YYYY-MM-DD format
if extra:
# Create log directory path
log_dir = os.path.join("runs", timestamp, experiment_name, model_name, extra)
else:
log_dir = os.path.join("runs", timestamp, experiment_name, model_name)
print(f"[INFO] Created SummaryWriter, saving to: {log_dir}...")
return SummaryWriter(log_dir=log_dir)
python
# Create an example writer
example_writer = create_writer(experiment_name="data_10_percent",
model_name="effnetb0",
extra="5_epochs")
- 更新 train() 函数以包含 writer 参数
为了调整 train() 函数,我们将向该函数添加一个 writer 参数,然后添加一些代码来查看是否存在 writer 以及是否存在,我们将在那里跟踪我们的信息。
python
from typing import Dict, List
from tqdm.auto import tqdm
# Add writer parameter to train()
def train(model: torch.nn.Module,
train_dataloader: torch.utils.data.DataLoader,
test_dataloader: torch.utils.data.DataLoader,
optimizer: torch.optim.Optimizer,
loss_fn: torch.nn.Module,
epochs: int,
device: torch.device,
writer: torch.utils.tensorboard.writer.SummaryWriter # new parameter to take in a writer
) -> Dict[str, List]:
"""Trains and tests a PyTorch model.
Passes a target PyTorch models through train_step() and test_step()
functions for a number of epochs, training and testing the model
in the same epoch loop.
Calculates, prints and stores evaluation metrics throughout.
Stores metrics to specified writer log_dir if present.
Args:
model: A PyTorch model to be trained and tested.
train_dataloader: A DataLoader instance for the model to be trained on.
test_dataloader: A DataLoader instance for the model to be tested on.
optimizer: A PyTorch optimizer to help minimize the loss function.
loss_fn: A PyTorch loss function to calculate loss on both datasets.
epochs: An integer indicating how many epochs to train for.
device: A target device to compute on (e.g. "cuda" or "cpu").
writer: A SummaryWriter() instance to log model results to.
Returns:
A dictionary of training and testing loss as well as training and
testing accuracy metrics. Each metric has a value in a list for
each epoch.
In the form: {train_loss: [...],
train_acc: [...],
test_loss: [...],
test_acc: [...]}
For example if training for epochs=2:
{train_loss: [2.0616, 1.0537],
train_acc: [0.3945, 0.3945],
test_loss: [1.2641, 1.5706],
test_acc: [0.3400, 0.2973]}
"""
# Create empty results dictionary
results = {"train_loss": [],
"train_acc": [],
"test_loss": [],
"test_acc": []
}
# Loop through training and testing steps for a number of epochs
for epoch in tqdm(range(epochs)):
train_loss, train_acc = train_step(model=model,
dataloader=train_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
device=device)
test_loss, test_acc = test_step(model=model,
dataloader=test_dataloader,
loss_fn=loss_fn,
device=device)
# Print out what's happening
print(
f"Epoch: {epoch+1} | "
f"train_loss: {train_loss:.4f} | "
f"train_acc: {train_acc:.4f} | "
f"test_loss: {test_loss:.4f} | "
f"test_acc: {test_acc:.4f}"
)
# Update results dictionary
results["train_loss"].append(train_loss)
results["train_acc"].append(train_acc)
results["test_loss"].append(test_loss)
results["test_acc"].append(test_acc)
### New: Use the writer parameter to track experiments ###
# See if there's a writer, if so, log to it
if writer:
# Add results to SummaryWriter
writer.add_scalars(main_tag="Loss",
tag_scalar_dict={"train_loss": train_loss,
"test_loss": test_loss},
global_step=epoch)
writer.add_scalars(main_tag="Accuracy",
tag_scalar_dict={"train_acc": train_acc,
"test_acc": test_acc},
global_step=epoch)
# Close the writer
writer.close()
else:
pass
### End new ###
# Return the filled results at the end of the epochs
return results
7. 建立一系列建模实验
- 进行怎样的实验:
每个超参数都是不同实验的起点:
- 更改
epochs
- 更改层数/隐藏单元数
- 更改数据量
- 改变学习率
- 尝试不同类型的数据增强
- 选择不同的模型架构
通常你的模型越大(可学习的参数越多),你拥有的数据越多(学习的机会越多) ),性能越好。
但是,从小处开始,如果有效果,再扩大规模。
- 进行那些实验:
目标是改进为 FoodVision Mini 提供动力的模型,同时避免其变得太大。
即理想模型实现了高水平的测试集准确度(90%+),但不需要太长时间来训练/执行推理(做出预测)。
尝试一下组合:
- 不同数量的数据集(披萨、牛排、寿司的 10% 与 20%)
- 不同的模型(
torchvision.models.efficientnet_b0
与torchvision.models.efficientnet_b2
) - 不同的训练时间(5 个 epoch 与 10 个 epoch)
得到以下实验组合:
Experiment number | Training Dataset | Model (pretrained on ImageNet) | Number of epochs |
---|---|---|---|
1 | Pizza, Steak, Sushi 10% percent | EfficientNetB0 | 5 |
2 | Pizza, Steak, Sushi 10% percent | EfficientNetB2 | 5 |
3 | Pizza, Steak, Sushi 10% percent | EfficientNetB0 | 10 |
4 | Pizza, Steak, Sushi 10% percent | EfficientNetB2 | 10 |
5 | Pizza, Steak, Sushi 20% percent | EfficientNetB0 | 5 |
6 | Pizza, Steak, Sushi 20% percent | EfficientNetB2 | 5 |
7 | Pizza, Steak, Sushi 20% percent | EfficientNetB0 | 10 |
8 | Pizza, Steak, Sushi 20% percent | EfficientNetB2 | 10 |
请注意上述实验是慢慢扩大规模的,在每次实验中,我们都会慢慢增加数据量、模型大小和训练时间。到最后,与实验 1 相比,实验 8 将使用双倍的数据、双倍的模型大小和双倍的训练长度。
这里设计的只是选项的一小部分,因为无法测试所有内容,因此最好先尝试一些事情,然后遵循效果最好的那些。
相关数据集全部数据:Food101
- 下载不同比例的数据集
需要两种形式的训练集:10%和20%比例的
需要的测试集:全部使用10%的数据集测试集进行测试【保持一致性】
下载代码:
python
# Download 10 percent and 20 percent training data (if necessary)
data_10_percent_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
destination="pizza_steak_sushi")
data_20_percent_path = download_data(source="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip",
destination="pizza_steak_sushi_20_percent")
创建不同的训练目录路径,但只需要一个测试目录路径,因为所有实验都将使用相同的测试数据集(测试数据集来自披萨、牛排、寿司 10%):
python
# Setup training directory paths
train_dir_10_percent = data_10_percent_path / "train"
train_dir_20_percent = data_20_percent_path / "train"
# Setup testing directory paths (note: use the same test dataset for both to compare the results)
test_dir = data_10_percent_path / "test"
# Check the directories
print(f"Training directory 10%: {train_dir_10_percent}")
print(f"Training directory 20%: {train_dir_20_percent}")
print(f"Testing directory: {test_dir}")
- 转换数据集并创建DataLoaders
将创建一系列变换来为模型准备图像,为了保持一致,我们将手动创建一个转换并在所有数据集中使用相同的转换。
(1)调整所有图像的大小(我们将从 224、224 开始,但这可以更改)
(2)将它们转换为值在 0 和 1 之间的张量。
(3)以某种方式对它们进行标准化,使它们的分布与 ImageNet 数据集内联(我们这样做是因为我们来自 torchvision.models 的模型已经在 ImageNet 上进行了预训练)
python
from torchvision import transforms
# Create a transform to normalize data distribution to be inline with ImageNet
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], # values per colour channel [red, green, blue]
std=[0.229, 0.224, 0.225]) # values per colour channel [red, green, blue]
# Compose transforms into a pipeline
simple_transform = transforms.Compose([
transforms.Resize((224, 224)), # 1. Resize the images
transforms.ToTensor(), # 2. Turn the images into tensors with values between 0 & 1
normalize # 3. Normalize the images so their distributions match the ImageNet dataset
])
data_setup.py
中的 create_dataloaders() 函数来创建 DataLoaders,使用相同的 test_dataloader (以保持比较一致):
python
BATCH_SIZE = 32
# Create 10% training and test DataLoaders
train_dataloader_10_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_10_percent,
test_dir=test_dir,
transform=simple_transform,
batch_size=BATCH_SIZE
)
# Create 20% training and test data DataLoders
train_dataloader_20_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_20_percent,
test_dir=test_dir,
transform=simple_transform,
batch_size=BATCH_SIZE
)
# Find the number of samples/batches per dataloader (using the same test_dataloader for both experiments)
print(f"Number of batches of size {BATCH_SIZE} in 10 percent training data: {len(train_dataloader_10_percent)}")
print(f"Number of batches of size {BATCH_SIZE} in 20 percent training data: {len(train_dataloader_20_percent)}")
print(f"Number of batches of size {BATCH_SIZE} in testing data: {len(train_dataloader_10_percent)} (all experiments will use the same test set)")
print(f"Number of classes: {len(class_names)}, class names: {class_names}")
- 创建特征提取器模型
创建两个特征提取器模型:
torchvision.models.efficientnet_b0()
预训练的主干+自定义分类器头(简称EffNetB0)。torchvision.models.efficientnet_b2()
预训练的主干+自定义分类器头(简称EffNetB2)。
为此,我们将冻结基础层(特征层)并更新模型的分类器头(输出层)以适应我们的问题.
EffNetB0 分类器头的 in_features
参数是 1280 (主干网将输入图像转换为大小为 1280 的特征向量) 。
由于 EffNetB2 具有不同数量的层和参数,因此我们需要相应地对其进行调整。
我们可以使用 torchinfo.summary()
并传入 input_size=(32, 3, 224, 224)
参数找到 EffNetB2 的输入和输出形状( (32, 3, 224, 224)
相当于 (batch_size, color_channels, height, width)
,即我们传入一个示例,说明单批数据将是什么到我们的模型)。
为了找到 EffNetB2 最后一层所需的输入形状,进行如下操作:
(1)创建 torchvision.models.efficientnet_b2(pretrained=True)
的实例。
(2)通过运行 torchinfo.summary() 查看各种输入和输出形状。
(3)通过检查 EffNetB2 分类器部分的 state_dict() 并打印权重矩阵的长度,打印出 in_features 的数量。【也可以只检查 effnetb2.classifier 的输出】
why:由于
torch.nn.AdaptiveAvgPool2d()
层,许多现代模型可以处理不同大小的输入图像,该层根据需要自适应调整给定输入的 output_size 。您可以通过将不同大小的输入图像传递到 torchinfo.summary() 或使用该图层传递到您自己的模型来尝试此操作。
python
import torchvision
from torchinfo import summary
# 1. Create an instance of EffNetB2 with pretrained weights
effnetb2_weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT # "DEFAULT" means best available weights
effnetb2 = torchvision.models.efficientnet_b2(weights=effnetb2_weights)
# # 2. Get a summary of standard EffNetB2 from torchvision.models (uncomment for full output)
# summary(model=effnetb2,
# input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape"
# # col_names=["input_size"], # uncomment for smaller output
# col_names=["input_size", "output_size", "num_params", "trainable"],
# col_width=20,
# row_settings=["var_names"]
# )
# 3. Get the number of in_features of the EfficientNetB2 classifier layer
print(f"Number of in_features to final layer of EfficientNetB2: {len(effnetb2.classifier.state_dict()['1.weight'][0])}")
EffNetB2 特征提取器模型的模型摘要,其中所有层均未冻结(可训练),并且来自 ImageNet 预训练的默认分类器头。
现在我们知道了 EffNetB2 模型所需的 in_features
数量,创建几个辅助函数来设置 EffNetB0 和 EffNetB2 特征提取器模型。
函数能够:
(1)从 torchvision.models 获取基本模型
(2)冻结模型中的基础层(设置 requires_grad=False
)
(3)设置随机种子
(4)更改分类器头(以适应我们的问题)
(5)为模型命名(例如 EffNetB0 为"effnetb0")
python
import torchvision
from torch import nn
# Get num out features (one for each class pizza, steak, sushi)
OUT_FEATURES = len(class_names)
# Create an EffNetB0 feature extractor
def create_effnetb0():
# 1. Get the base mdoel with pretrained weights and send to target device
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
model = torchvision.models.efficientnet_b0(weights=weights).to(device)
# 2. Freeze the base model layers
for param in model.features.parameters():
param.requires_grad = False
# 3. Set the seeds
set_seeds()
# 4. Change the classifier head
model.classifier = nn.Sequential(
nn.Dropout(p=0.2),
nn.Linear(in_features=1280, out_features=OUT_FEATURES)
).to(device)
# 5. Give the model a name
model.name = "effnetb0"
print(f"[INFO] Created new {model.name} model.")
return model
# Create an EffNetB2 feature extractor
def create_effnetb2():
# 1. Get the base model with pretrained weights and send to target device
weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT
model = torchvision.models.efficientnet_b2(weights=weights).to(device)
# 2. Freeze the base model layers
for param in model.features.parameters():
param.requires_grad = False
# 3. Set the seeds
set_seeds()
# 4. Change the classifier head
model.classifier = nn.Sequential(
nn.Dropout(p=0.3),
nn.Linear(in_features=1408, out_features=OUT_FEATURES)
).to(device)
# 5. Give the model a name
model.name = "effnetb2"
print(f"[INFO] Created new {model.name} model.")
return model
创建 EffNetB0 和 EffNetB2 的实例并检查它们的 summary() 来测试它们:
python
effnetb0 = create_effnetb0()
# Get an output summary of the layers in our EffNetB0 feature extractor model (uncomment to view full output)
summary(model=effnetb0,
input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape"
# col_names=["input_size"], # uncomment for smaller output
col_names=["input_size", "output_size", "num_params", "trainable"],
col_width=20,
row_settings=["var_names"]
)
EffNetB0 模型的模型摘要,基础层已冻结(无法训练)并更新了分类器头(适用于披萨、牛排、寿司图像分类)。
python
effnetb2 = create_effnetb2()
# Get an output summary of the layers in our EffNetB2 feature extractor model (uncomment to view full output)
summary(model=effnetb2,
input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape"
# col_names=["input_size"], # uncomment for smaller output
col_names=["input_size", "output_size", "num_params", "trainable"],
col_width=20,
row_settings=["var_names"]
)
EffNetB2 模型的模型摘要,基础层已冻结(无法训练)并更新了分类器头(适用于披萨、牛排、寿司图像分类)。
从摘要的输出来看,EffNetB2 主干网络的参数数量几乎是 EffNetB0 的两倍。
- 创建实验并设置训练代码
首先创建两个列表和一个字典:
- epoch列表 (
[5, 10]
) - 测试的模型列表 (
["effnetb0", "effnetb2"]
) - 不同训练 DataLoader 的字典
python
# 1. Create epochs list
num_epochs = [5, 10]
# 2. Create models list (need to create a new model for each experiment)
models = ["effnetb0", "effnetb2"]
# 3. Create dataloaders dictionary for various dataloaders
train_dataloaders = {"data_10_percent": train_dataloader_10_percent,
"data_20_percent": train_dataloader_20_percent}
编写代码来迭代每个不同的选项并尝试每个不同的组合,在每次实验结束时保存模型,以便稍后我们可以加载回最佳模型并使用它进行预测。
(1)设置随机种子
(2)跟踪不同的实验编号【方便打印结果】
(3)循环遍历每个不同训练 DataLoader 的 train_dataloaders 字典项
(4)循环遍历epoch编号列表。
(5)循环浏览不同模型名称的列表。
(6)为当前正在运行的实验创建信息打印输出,以便我们知道发生了什么
(7)检查哪个模型是目标模型并创建一个新的 EffNetB0 或 EffNetB2 实例(我们每个实验都会创建一个新的模型实例,因此所有模型都从相同的角度开始)。
(8)为每个新实验创建一个新的损失函数 ( torch.nn.CrossEntropyLoss()
) 和优化器 ( torch.optim.Adam(params=model.parameters(), lr=0.001)
)。
(9)使用修改后的 train()
函数训练模型,将适当的详细信息传递给 writer 参数。
(10)使用适当的文件名将经过训练的模型保存到 utils.py
中的 save_model() 文件中。
python
%%time
from going_modular.going_modular.utils import save_model
# 1. Set the random seeds
set_seeds(seed=42)
# 2. Keep track of experiment numbers
experiment_number = 0
# 3. Loop through each DataLoader
for dataloader_name, train_dataloader in train_dataloaders.items():
# 4. Loop through each number of epochs
for epochs in num_epochs:
# 5. Loop through each model name and create a new model based on the name
for model_name in models:
# 6. Create information print outs
experiment_number += 1
print(f"[INFO] Experiment number: {experiment_number}")
print(f"[INFO] Model: {model_name}")
print(f"[INFO] DataLoader: {dataloader_name}")
print(f"[INFO] Number of epochs: {epochs}")
# 7. Select the model
if model_name == "effnetb0":
model = create_effnetb0() # creates a new model each time (important because we want each experiment to start from scratch)
else:
model = create_effnetb2() # creates a new model each time (important because we want each experiment to start from scratch)
# 8. Create a new loss and optimizer for every model
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)
# 9. Train target model with target dataloaders and track experiments
train(model=model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
optimizer=optimizer,
loss_fn=loss_fn,
epochs=epochs,
device=device,
writer=create_writer(experiment_name=dataloader_name,
model_name=model_name,
extra=f"{epochs}_epochs"))
# 10. Save the model to file so we can get back the best model
save_filepath = f"07_{model_name}_{dataloader_name}_{epochs}_epochs.pth"
save_model(model=model,
target_dir="models",
model_name=save_filepath)
print("-"*50 + "\n")
8. 在TensorBoard中查看实验
python
# Viewing TensorBoard in Jupyter and Google Colab Notebooks (uncomment to view full TensorBoard instance)
%load_ext tensorboard
%tensorboard --logdir runs
最重要的是趋势。您的数字将走向何方。如果偏差很大,可能出了问题,最好回去检查代码。但如果它们偏差很小(比如小数点后几位左右),那也没关系。
在 TensorBoard 中可视化不同建模实验的测试损失值,您可以看到训练 10 个 epoch 且使用 20% 数据的 EffNetB0 模型实现了最低损失。这符合实验的总体趋势:更多的数据、更大的模型和更长的训练时间通常更好。
9. 加载最佳模型并用它进行预测
最大的模型取得了最好的结果,我们可以通过使用 create_effnetb2()
函数创建 EffNetB2 的新实例来导入最佳保存的模型,然后使用 torch.load()
加载保存的 state_dict()
。
python
# Setup the best model filepath
best_model_path = "models/07_effnetb2_data_20_percent_10_epochs.pth"
# Instantiate a new instance of EffNetB2 (to load the saved state_dict() to)
best_model = create_effnetb2()
# Load the saved best model state_dict()
best_model.load_state_dict(torch.load(best_model_path))
查看文件模型大小,太大难以部署:
python
# Check the model file size
from pathlib import Path
# Get the model size in bytes then convert to megabytes
effnetb2_model_size = Path(best_model_path).stat().st_size // (1024*1024)
print(f"EfficientNetB2 feature extractor model size: {effnetb2_model_size} MB")
做出一些预测并将其可视化:
【创建了一个 pred_and_plot_image()
函数来使用经过训练的模型对图像进行预测。】
pred_and_plot_image()
函数在predictions.py
代码里,可以直接调用,补predictions.py
代码:
python
"""
Utility functions to make predictions.
Main reference for code creation: https://www.learnpytorch.io/06_pytorch_transfer_learning/#6-make-predictions-on-images-from-the-test-set
"""
import torch
import torchvision
from torchvision import transforms
import matplotlib.pyplot as plt
from typing import List, Tuple
from PIL import Image
# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Predict on a target image with a target model
# Function created in: https://www.learnpytorch.io/06_pytorch_transfer_learning/#6-make-predictions-on-images-from-the-test-set
def pred_and_plot_image(
model: torch.nn.Module,
class_names: List[str],
image_path: str,
image_size: Tuple[int, int] = (224, 224),
transform: torchvision.transforms = None,
device: torch.device = device,
):
"""Predicts on a target image with a target model.
Args:
model (torch.nn.Module): A trained (or untrained) PyTorch model to predict on an image.
class_names (List[str]): A list of target classes to map predictions to.
image_path (str): Filepath to target image to predict on.
image_size (Tuple[int, int], optional): Size to transform target image to. Defaults to (224, 224).
transform (torchvision.transforms, optional): Transform to perform on image. Defaults to None which uses ImageNet normalization.
device (torch.device, optional): Target device to perform prediction on. Defaults to device.
"""
# Open image
img = Image.open(image_path)
# Create transformation for image (if one doesn't exist)
if transform is not None:
image_transform = transform
else:
image_transform = transforms.Compose(
[
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
),
]
)
### Predict on image ###
# Make sure the model is on the target device
model.to(device)
# Turn on model evaluation mode and inference mode
model.eval()
with torch.inference_mode():
# Transform and add an extra dimension to image (model requires samples in [batch_size, color_channels, height, width])
transformed_image = image_transform(img).unsqueeze(dim=0)
# Make a prediction on image with an extra dimension and send it to the target device
target_image_pred = model(transformed_image.to(device))
# Convert logits -> prediction probabilities (using torch.softmax() for multi-class classification)
target_image_pred_probs = torch.softmax(target_image_pred, dim=1)
# Convert prediction probabilities -> prediction labels
target_image_pred_label = torch.argmax(target_image_pred_probs, dim=1)
# Plot image with predicted label and probability
plt.figure()
plt.imshow(img)
plt.title(
f"Pred: {class_names[target_image_pred_label]} | Prob: {target_image_pred_probs.max():.3f}"
)
plt.axis(False)
开始随机预测:
python
# Import function to make predictions on images and plot them
# See the function previously created in section: https://www.learnpytorch.io/06_pytorch_transfer_learning/#6-make-predictions-on-images-from-the-test-set
from going_modular.going_modular.predictions import pred_and_plot_image
# Get a random list of 3 images from 20% test set
import random
num_images_to_plot = 3
test_image_path_list = list(Path(data_20_percent_path / "test").glob("*/*.jpg")) # get all test image paths from 20% dataset
test_image_path_sample = random.sample(population=test_image_path_list,
k=num_images_to_plot) # randomly select k number of images
# Iterate through random test image paths, make predictions on them and plot them
for image_path in test_image_path_sample:
pred_and_plot_image(model=best_model,
image_path=image_path,
class_names=class_names,
image_size=(224, 224))
最后使用最佳模型预测自定义图像:
python
# Download custom image
import requests
# Setup custom image path
custom_image_path = Path("data/04-pizza-dad.jpeg")
# Download the image if it doesn't already exist
if not custom_image_path.is_file():
with open(custom_image_path, "wb") as f:
# When downloading from GitHub, need to use the "raw" file link
request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/04-pizza-dad.jpeg")
print(f"Downloading {custom_image_path}...")
f.write(request.content)
else:
print(f"{custom_image_path} already exists, skipping download.")
# Predict on custom image
pred_and_plot_image(model=model,
image_path=custom_image_path,
class_names=class_names)
补充
使用 20% 披萨、牛排、寿司训练和测试数据集将数据增强引入到实验列表中:
python
# Note: Data augmentation transform like this should only be performed on training data
train_transform_data_aug = transforms.Compose([
transforms.Resize((224, 224)),
transforms.TrivialAugmentWide(),
transforms.ToTensor(),
normalize
])
# Helper function to view images in a DataLoader (works with data augmentation transforms or not)
def view_dataloader_images(dataloader, n=10):
if n > 10:
print(f"Having n higher than 10 will create messy plots, lowering to 10.")
n = 10
imgs, labels = next(iter(dataloader))
plt.figure(figsize=(16, 8))
for i in range(n):
# Min max scale the image for display purposes
targ_image = imgs[i]
sample_min, sample_max = targ_image.min(), targ_image.max()
sample_scaled = (targ_image - sample_min)/(sample_max - sample_min)
# Plot images with appropriate axes information
plt.subplot(1, 10, i+1)
plt.imshow(sample_scaled.permute(1, 2, 0)) # resize for Matplotlib requirements
plt.title(class_names[labels[i]])
plt.axis(False)
# Have to update `create_dataloaders()` to handle different augmentations
import os
from torch.utils.data import DataLoader
from torchvision import datasets
NUM_WORKERS = os.cpu_count() # use maximum number of CPUs for workers to load data
# Note: this is an update version of data_setup.create_dataloaders to handle
# differnt train and test transforms.
def create_dataloaders(
train_dir,
test_dir,
train_transform, # add parameter for train transform (transforms on train dataset)
test_transform, # add parameter for test transform (transforms on test dataset)
batch_size=32, num_workers=NUM_WORKERS
):
# Use ImageFolder to create dataset(s)
train_data = datasets.ImageFolder(train_dir, transform=train_transform)
test_data = datasets.ImageFolder(test_dir, transform=test_transform)
# Get class names
class_names = train_data.classes
# Turn images into data loaders
train_dataloader = DataLoader(
train_data,
batch_size=batch_size,
shuffle=True,
num_workers=num_workers,
pin_memory=True,
)
test_dataloader = DataLoader(
test_data,
batch_size=batch_size,
shuffle=True,
num_workers=num_workers,
pin_memory=True,
)
return train_dataloader, test_dataloader, class_names