一、基本错误
1、UserWarning: Grad strides do not match bucket view strides . This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
原因:由于transpose()、permute()、einops.rearrange(***)或view()等操作导致tensor内存不连续
解决办法:在这些操作后面添加.contiguous()即可