数据增强之transforms库（torchvision）

transforms 库包含了多种对图像进行处理的方法，主要用于数据加载和预处理过程中，常用于图像分类、目标检测和语义分割等任务。以下是一些常见的transforms库的操作。

1.图像缩放

图像缩放是由Resize 类来实现的，主要用于对图像尺寸进行调整。

Resize类参数如下：

**1.size：**可以是一个整数、元组或列表。如果是整数，则表示将图像的较小边调整为这个大小，长宽比保持不变；如果是元组或列表 (h, w)，则表示将图像调整为指定的高度和宽度。

**2.interpolation：**插值方法，用于调整图像大小时重新取样像素值,默认是PIL.Image.BILINEAR。

可选的插值方法有：

PIL.Image.NEAREST：最近邻插值。

PIL.Image.BILINEAR：双线性插值。

PIL.Image.BICUBIC：双三次插值。

PIL.Image.LANCZOS：Lanczos 插值。

python 复制代码

from torchvision import transforms
from PIL import Image

# 图像路径
image_path = 'image.jpg'

# 创建 Resize 转换对象
resize = transforms.Resize((256, 256))  
# 将图像调整为 256x256


# 读取图像并应用 Resize 转换
img = Image.open(image_path)
resized_img = resize(img)

2.随机裁剪

随机裁剪是由RandomCrop 类来实现的。参数如下：

1.size： 指定裁剪后的输出图像尺寸。可以是一个整数，也可以是一个元组 (height, width)，表示裁剪后的高度和宽度。

2.padding：（可选）如果设置了 padding，会首先在图像周围填充指定的像素，再进行裁剪。这对于确保裁剪区域不超出图像边界很有用。

3.pad_if_needed ：（可选）如果为 True，在需要填充的情况下，会根据 padding_mode 进行填充。

**4.fill：（**可选）填充的像素值，默认是 0。

5.padding_mode：（可选）填充模式，可选项包括 'constant'、'edge'、'reflect' 和 'symmetric'，默认是 'constant'。

python 复制代码

from torchvision import transforms
from PIL import Image

# 图像路径
image_path = 'image.jpg'

# 创建 RandomCrop 转换对象
crop = transforms.RandomCrop((224, 224))  
# 随机裁剪为 224x224

# 读取图像并应用 RandomCrop 转换
img = Image.open(image_path)
cropped_img = crop(img)

3.图像翻转

使用RandomHorizontalFlip类和RandomVerticalFlip类来实现图像的水平翻转和垂直翻转。两个类都只有一个参数。如下：

p：翻转概率，默认为 0.5。即有 50% 的概率对图像进行翻转。

python 复制代码

from torchvision import transforms
from PIL import Image

# 图像路径
image_path = 'image.jpg'

# 创建 RandomHorizontalFlip 转换对象
horizontal_flip = transforms.RandomHorizontalFlip(p=0.5)  
# 随机水平翻转，概率为 0.5

# 创建 RandomVerticalFlip 转换对象
vertical_flip = transforms.RandomVerticalFlip(p=0.5)  
# 随机垂直翻转，概率为 0.5

# 读取图像并应用翻转转换
img = Image.open(image_path)

# 应用水平翻转
flipped_img_horizontal = horizontal_flip(img)

# 应用垂直翻转
flipped_img_vertical = vertical_flip(img)

4.图像归一化

图像归一化是一种常见的预处理操作，用于将图像的像素值标准化到特定的范围内，以便于模型训练过程中的稳定性和收敛速度。在transforms库中是由Normalize类来实现图像的归一化操作。参数如下：

**1.mean：**用于归一化的均值。可以是一个列表或元组，每个通道的均值。

**2.std：**用于归一化的标准差。可以是一个列表或元组，每个通道的标准差。

python 复制代码

from torchvision import transforms
from PIL import Image

# 图像路径
image_path = 'image.jpg'

# mean 和 std 的值通常需要根据你的具体数据集来进行调整
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

# 创建 Normalize 转换对象
normalize = transforms.Normalize(mean=mean, std=std)

# 读取图像并应用 Normalize 转换
img = Image.open(image_path)
normalized_img = normalize(img)

5.将PIL或numpy数据转化为tensor

ToTensor 类是用于将 PIL 图像或 numpy 数组转换为 tensor。

python 复制代码

from torchvision import transforms
from PIL import Image
import numpy as np
import torch

# 图像路径
image_path = 'image.jpg'

# 创建 ToTensor 转换对象
to_tensor = transforms.ToTensor()

# 读取图像并应用 ToTensor 转换

img_pil = Image.open(image_path)  # 读取 PIL 图像
img_tensor = to_tensor(img_pil)   # 将 PIL 图像转换为张量

img_np = np.array(img_pil)   # 将 PIL 图像转换为 numpy 数组
img_tensor_from_np = to_tensor(img_np)   # 将 numpy 数组转换为张量

#如果输入是 PIL 图像，会将图像的像素值范围从 [0, 255] 转换到 [0.0, 1.0]，并重新排列通道顺序。
#如果输入是 numpy 数组，会将数组的数据类型转换为 torch.FloatTensor，并调整形状以匹配 PyTorch 张量的格式。

以上就是transforms库中一些常用的数据增强的操作。这些操作可以通过Compose类（如下）组合在一起使用，实现更好地增强效果。

python 复制代码

from torchvision import transforms
from PIL import Image

# 创建一个组合的转换操作
composed_transforms = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.RandomCrop(224),  # 随机裁剪为 224x224
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# 应用组合的转换到图像
transformed_img = composed_transforms(img)