机器学习使用GPU

Psycho_MrZhang2025-10-17 17:52

使用GPU

使用下面的命令来查看GPU的状态

shell 复制代码

!nvidia-smi

代码	含义
Memory-Usage	内存使用量/总量
GPU-Util	(运行时)GPU使用量
CUDA Version	CUDA的版本, 需要对应版本的框架

使用GPU运算

张量

python 复制代码

import torch

torch.device('cpu')
torch.device('cuda') # 使用GPU
torch.device('cuda:1')  # 访问第一个GPU

查看有多少GPU

python 复制代码

torch.cuda.device_count()

测试GPU环境

python 复制代码

def try_gpu(i=0):
	if torch.cuda.device_count() >= i + 1:
		return torch.device(f'cuda:{i}')
	return torch.device('cpu')

def try_all_gpu():
	devices = [torch.device(f'cuda:{i}') for i in range(torch.cuda.device_count())]
	return devices if devices else [torch.device('cpu')]

查询张量所在的设备

python 复制代码

x = torch.tensor([1, 2, 3])
x.device # device(type='cpu')

存储在gpu

python 复制代码

X = torch.ones(2, 3, device=try_gpu())
X # tensor(..., device='cuda:0')

第二个GPU创建张量

python 复制代码

X = torch.ones(2, 3, device=try_gpu(1))
X # tensor(..., device='cuda:1')

计算X, Y, 需要确定在同一个GPU执行计算操作

python 复制代码

Z = X.cuda(1)
Z # tensor(..., device='cuda:1')

Z.cuda(1) is Z  # True, 如果已经在对应的GPU, 不会做任何改变和开销

如果将不同的层分散放在CPU和GPU, 计算时会造成很大开销和性能问题, 并且不易排查, 所以最开始初始化就建议使用一个环境, 不要来回COPY切换

神经网络

神经网络在GPU

python 复制代码

net = nn.Sequential(nn.Linear(3, 1))
net = net.to(device=try_gpu())

net(X)

确认模型参数存储在同一个GPU

python 复制代码

net[0].weight.data.device # device(type='cuda',index=0)