pytorch单机多卡训练_数据并行DataParallel

1.单机多卡概述

单卡多级的模型训练,即并行训练,可分为数据并行和模型并行两种.

数据并行是指,多张 GPUs 使用相同的模型副本,但采用不同 batch 的数据进行训练.

模型并行是指,多张 GPUs 使用同一 batch 的数据,分别训练模型的不同部分.

2.DataParallel源码

2.1 需要传入的参数

module(Module):被并行运算的模型

device_ids=None: CUDA devices

output_device=None:输出设备位置

2.2 forward

复制代码
检查设备是否合理;
如果合理遍历模型参数和其缓存区, 检查参数和缓冲区的设备是否与src_device_obj相同如果不同抛RuntimeError。
                源码说模型和参数必须放到device_ids[0]
                将输入数据根据设备数量分发到不同的设备上。

3.DataParallel案例

通过 PyTorch 使用 GPU 非常简单。您可以将模型放在 GPU 上

复制代码
device = torch.device("cuda:0")
model.to(device)

然后,您可以将所有张量复制到 GPU:

复制代码
mytensor = my_tensor.to(device)

--注意:mytensor.to(device)等所有tensor操作都是copy数据然后重载,不改变原tensor

。但是,Pytorch 默认情况下仅使用一个 GPU。通过使用以下命令使模型并行运行,您可以轻松地在多个 GPU 上运行操作 DataParallel :

复制代码
model = nn.DataParallel(model)

导入Pytorch模块并导入参数

复制代码
import torch
from torch.utils.data import Dataset,DataLoader
import torch.nn as nn


# Parameters and DataLoaders
input_size = 5
output_size = 2

batch_size = 30
data_size = 100

device

复制代码
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

设计一个dummy数据集

复制代码
class RandomDataset(Dataset):
	def __init__(self, size, length):
    	self.len = length
  	  	self.data = torch.randn(length, size)

	def __getitem__(self, index):
	    return self.data[index]
	
	def __len__(self):
 		return self.len
    
rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
                         batch_size=batch_size, shuffle=True)

Our model

复制代码
class Model(nn.Module):
	def __init__(self, input_size, output_size):
    	super(Model, self).__init__()
    	self.fc = nn.Linear(input_size, output_size)

	def forward(self, input):
    	output = self.fc(input)
    	print("\tIn Model: input size", input.size(),
          "output size", output.size())

    	return output

创建模型和DataParallel

复制代码
model = Model(input_size,output_size)       #初始化参数对应,不着急进行设备关联
if torch.cuda.device_count() >1:            #判别一下是否多GPU
    print("可以进行数据并行训练")
    model = nn.DataParallel(model)          #是的话可以进行初始化操作

model = model.to(device)                    #pytorch一般而已都是重新赋值操作而非修改源数据

数据并行加载数据

复制代码
for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("\tIn Model: input size", input.size())

运行模型

复制代码
for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("Outside: input size", input.size(),
          "output_size", output.size())

/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py:116: UserWarning:

Attempting to run cuBLAS, but there was no current CUDA context! Attempting to set the primary context... (Triggered internally at .../aten/src/ATen/cuda/CublasHandlePool.cpp:135.)

复制代码
    In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
    In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
    In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])
    In Model: input size torch.Size([6, 5]) output size torch.Size([6, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])

In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])

In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])

In Model: input size torch.Size([6, 5]) output size torch.Size([6, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])

In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])

In Model: input size torch.Size([6, 5]) output size torch.Size([6, 2])

In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([3, 5]) output size torch.Size([3, 2])

In Model: input size torch.Size([3, 5]) output size torch.Size([3, 2])

In Model: input size torch.Size([3, 5]) output size torch.Size([3, 2])

In Model: input size torch.Size([1, 5]) output size torch.Size([1, 2])

Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

如果您没有 GPU 或只有一个 GPU,则当我们批量处理 30 个输入和 30 个输出时,模型将按预期获得 30 个输入和 30 个输出。但如果你有多个 GPU,那么你可以获得这样的结 果。

Let's use 2 GPUs!

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])

In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])

Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

Let's use 3 GPUs!

In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])

In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])

In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])

In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])

In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])

In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])

In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])

Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

Let's use 8 GPUs!

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])

In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])

Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])

In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])

In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])

In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])

In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])

In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])

Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

相关推荐
一 铭24 分钟前
AI领域新趋势:从提示(Prompt)工程到上下文(Context)工程
人工智能·语言模型·大模型·llm·prompt
云泽野3 小时前
【Java|集合类】list遍历的6种方式
java·python·list
麻雀无能为力4 小时前
CAU数据挖掘实验 表分析数据插件
人工智能·数据挖掘·中国农业大学
时序之心4 小时前
时空数据挖掘五大革新方向详解篇!
人工智能·数据挖掘·论文·时间序列
IMPYLH4 小时前
Python 的内置函数 reversed
笔记·python
.30-06Springfield5 小时前
人工智能概念之七:集成学习思想(Bagging、Boosting、Stacking)
人工智能·算法·机器学习·集成学习
说私域6 小时前
基于开源AI智能名片链动2+1模式S2B2C商城小程序的超级文化符号构建路径研究
人工智能·小程序·开源
永洪科技6 小时前
永洪科技荣获商业智能品牌影响力奖,全力打造”AI+决策”引擎
大数据·人工智能·科技·数据分析·数据可视化·bi
shangyingying_16 小时前
关于小波降噪、小波增强、小波去雾的原理区分
人工智能·深度学习·计算机视觉
小赖同学啊6 小时前
物联网数据安全区块链服务
开发语言·python·区块链