PyTorch Grid Sample

PyTorch Grid Sample

  • [1. `torch.nn.functional.grid_sample`](#1. torch.nn.functional.grid_sample)
    • [1.1. Parameters](#1.1. Parameters)
    • [1.2. Returns](#1.2. Returns)
  • [2. Grid Sample Native Functions](#2. Grid Sample Native Functions)
    • [2.1. align_corners (bool, optional)](#2.1. align_corners (bool, optional))
  • References

1. torch.nn.functional.grid_sample

复制代码
torch.nn.functional.grid_sample(input, grid, mode='bilinear', padding_mode='zeros', align_corners=None)

https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html

Compute grid sample.

Given an input and a flow-field grid, computes the output using input values and pixel locations from grid.

使用 input 的值和 grid 中的像素位置计算 output

Currently, only spatial (4-D) and volumetric (5-D) input are supported.

In the spatial (4-D) case, for input with shape ( N , C , H in , W in ) (N, C, H_\text{in}, W_\text{in}) (N,C,Hin,Win) and grid with shape ( N , H out , W out , 2 ) (N, H_\text{out}, W_\text{out}, 2) (N,Hout,Wout,2), the output will have shape ( N , C , H out , W out ) (N, C, H_\text{out}, W_\text{out}) (N,C,Hout,Wout).

For each output location output[n, :, h, w], the size-2 vector grid[n, h, w] specifies input pixel locations x and y, which are used to interpolate the output value output[n, :, h, w].

对于输出位置 output[n, :, h, w],大小为 2 的向量 grid[n, h, w] 指定了 input 的像素位置 xy,用于插值输出值 output[n, :, h, w]

In the case of 5D inputs, grid[n, d, h, w] specifies the x, y, z pixel locations for interpolating output[n, :, d, h, w]. mode argument specifies nearest or bilinear interpolation method to sample the input pixels.

对于 5D 输入,grid[n, d, h, w] 指定了插值 output[n, :, d, h, w]xyz 像素位置。mode 参数指定了用于采样输入像素的 nearest (最近邻) 或 bilinear (双线性) 插值方法。

grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-1, 1]. For example, values x = -1, y = -1 is the left-top pixel of input, and values x = 1, y = 1 is the right-bottom pixel of input.
grid 指定的采样像素位置是根据 input 的空间维度归一化的。因此,它的大多数值应该在 [-1, 1] 的范围内。例如,值为 x = -1, y = -1 对应 input 的左上像素,值为 x = 1, y = 1 对应 input 的右下像素。

If grid has values outside the range of [-1, 1], the corresponding outputs are handled as defined by padding_mode.

如果 grid 的值超出了 [-1, 1] 的范围,则根据 padding_mode 的定义处理相应的输出。

Options are

  • padding_mode="zeros": use 0 for out-of-bound grid locations (对于超出边界的网格位置,使用 0。)
  • padding_mode="border": use border values for out-of-bound grid locations (对于超出边界的网格位置,使用边界值。)
  • padding_mode="reflection": use values at locations reflected by the border for out-of-bound grid locations. For location far away from the border, it will keep being reflected until becoming in bound, e.g., (normalized) pixel location x = -3.5 reflects by border -1 and becomes x' = 1.5, then reflects by border 1 and becomes x'' = -0.5.
    对于超出边界的网格位置,使用沿边界反射后的值。对于远离边界的位置,会持续反射直到在边界内,例如归一化像素位置 x = -3.5 沿边界 -1 反射得到 x' = 1.5,然后沿边界 1 反射得到 x'' = -0.5

Note

This function is often used in conjunction with affine_grid to build Spatial Transformer Networks.

Spatial Transformer Networks
https://arxiv.org/abs/1506.02025

Note

When using the CUDA backend, this operation may induce nondeterministic behaviour in its backward pass that is not easily switched off. Please see the notes on Reproducibility for background.

在使用 CUDA 后端时,此操作的后向传播可能会引起不确定的行为,且不易关闭。
Note

NaN values in grid would be interpreted as -1.
grid 中的 NaN 值将被解释为 -1

1.1. Parameters

  • input (Tensor): input of shape ( N , C , H in , W in ) (N, C, H_\text{in}, W_\text{in}) (N,C,Hin,Win) (4-D case) or ( N , C , D in , H in , W in ) (N, C, D_\text{in}, H_\text{in}, W_\text{in}) (N,C,Din,Hin,Win) (5-D case)

  • grid (Tensor): flow-field of shape ( N , H out , W out , 2 ) (N, H_\text{out}, W_\text{out}, 2) (N,Hout,Wout,2) (4-D case) or ( N , D out , H out , W out , 3 ) (N, D_\text{out}, H_\text{out}, W_\text{out}, 3) (N,Dout,Hout,Wout,3) (5-D case)

  • mode (str): interpolation mode to calculate output values 'bilinear' | 'nearest' | 'bicubic'. Default: 'bilinear' Note: mode='bicubic' supports only 4-D input. When mode='bilinear' and the input is 5-D, the interpolation mode used internally will actually be trilinear. However, when the input is 4-D, the interpolation mode will legitimately be bilinear.

  • padding_mode (str): padding mode for outside grid values 'zeros' | 'border' | 'reflection'. Default: 'zeros'

  • align_corners (bool, optional): Geometrically, we consider the pixels of the input as squares rather than points.
    If set to True, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels.
    If set to False, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.
    This option parallels the align_corners option in interpolate, and so whichever option is used here should also be used there to resize the input image before grid sampling. Default: False

    legitimately [lɪ'dʒɪtɪmətlɪ]
    adv. 正当地
    geometrically [ˌdʒi:ə'metrɪklɪ]
    adv. 用几何学
    agnostic [æɡˈnɒstɪk]
    adj. 不可知论的
    n. 不可知论者

1.2. Returns

  • output (Tensor): output Tensor

Warning

When align_corners = True, the grid positions depend on the pixel size relative to the input image size, and so the locations sampled by grid_sample() will differ for the same input given at different resolutions (that is, after being upsampled or downsampled). The default behavior up to version 1.2.0 was align_corners = True. Since then, the default behavior has been changed to align_corners = False, in order to bring it in line with the default for interpolate().
Note
mode='bicubic' is implemented using the cubic convolution algorithm with α = − 0.75 \alpha=-0.75 α=−0.75. The constant α \alpha α might be different from packages to packages. For example, PIL and OpenCV use -0.5 and -0.75 respectively. This algorithm may "overshoot" the range of values it's interpolating. For example, it may produce negative values or values greater than 255 when interpolating input in 0, 255. Clamp the results with torch.clamp() to ensure they are within the valid range.

2. Grid Sample Native Functions

aten/src/ATen/native/GridSampler.h
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GridSampler.h

aten/src/ATen/native/GridSampler.cpp
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GridSampler.cpp

2.1. align_corners (bool, optional)

Geometrically, we consider the pixels of the input as squares rather than points.

在几何上,我们将输入像素视为方形而非点。

If set to True, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels.

如果设置为 True,则极限值 (-1 和 1) 被视为指向输入角像素的中心点。

If set to False, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.

如果设置为 False,则它们被视为指向输入角像素的角点,使采样更具分辨率无关性。

This option parallels the align_corners option in interpolate(), and so whichever option is used here should also be used there to resize the input image before grid sampling. Default: False

此选项平行于 interpolate() 中的 align_corners 选项,因此在此处使用的任何选项都应该在调整输入图像大小以进行网格采样时一并使用。默认值:False

Unnormalizes a coordinate from the -1 to +1 scale to its pixel index value, where we view each pixel as an area between (idx - 0.5) and (idx + 0.5).

if align_corners: -1 and +1 get sent to the center points of the input's corner pixels

复制代码
     -1 --> 0
     +1 --> (size - 1)
     scale_factor = ((size - 1) - 0) / ((+1) - (-1)) = (size - 1) / 2

if not align_corners: -1 and +1 get sent to the corner points of the input's corner pixels

复制代码
     -1 --> -0.5
     +1 --> (size - 1) + 0.5 == size - 0.5
     scale_factor = ((size - 0.5) - (-0.5)) / ((+1) - (-1)) = size / 2

References

1 Yongqiang Cheng (程永强), https://yongqiang.blog.csdn.net/

2 torch.nn.functional.grid_sample, https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html

3 torch.nn.functional.grid_sample, https://docs.pytorch.ac.cn/docs/stable/generated/torch.nn.functional.grid_sample.html

相关推荐
程序猿追5 天前
那个右下角的小数字怎么“卡”住我打字——我用 HarmonyOS 自己写了一个字数限制输入框
pytorch·华为·harmonyos
闵孚龙5 天前
《PyTorch 深度修炼》Dataset 和 DataLoader:数据如何喂给模型
人工智能·pytorch·python
bryant_meng5 天前
【VAE】From Pixels to Faces: Building a VAE from Scratch
pytorch·vae·log-sigma2·重参数
装不满的克莱因瓶5 天前
了解多标签图像分类方法——从Sigmoid输出到真实世界复杂视觉理解
人工智能·pytorch·python·深度学习·机器学习·分类·数据挖掘
冷小鱼5 天前
TensorFlow 2.21 进阶实战:从训练优化到生产部署的完整指南
人工智能·pytorch·python·tensorflow
冷小鱼5 天前
PyTorch 2.12 完全指南:从动态图到编译优化的深度学习框架演进
人工智能·pytorch·深度学习
IRevers5 天前
【大模型】Gemma4在ROCm和vLLM部署
人工智能·pytorch·深度学习·大模型·datawhale·vllm·amdev
盼小辉丶5 天前
PyTorch强化学习实战(14)——优先经验回放机制
pytorch·python·深度学习·强化学习
装不满的克莱因瓶5 天前
【工业领域】了解目标检测评估指标——从mAP到IoU的完整评价体系解析
人工智能·pytorch·python·深度学习·目标检测·计算机视觉·目标跟踪
闵孚龙5 天前
动态图机制:为什么 PyTorch 调试起来更舒服
人工智能·pytorch·python