start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
z = x + y
end.record()
# Waits for everything to finish running
torch.cuda.synchronize()
print(start.elapsed_time(end))
幸运的是,pytorch 自带了 profile 用于计算模型每个部分耗时 ,其既可以计算 cpu 耗时,也可以计算 gpu 耗时
python复制代码
x = torch.randn((1, 1), requires_grad=True)
with torch.autograd.profiler.profile(enabled=True) as prof:
for _ in range(100): # any normal python code, really!
y = x ** 2
print(prof.key_averages().table(sort_by="self_cpu_time_total"))