PyTorch Grid Sample
- [1. `torch.nn.functional.grid_sample`](#1.
torch.nn.functional.grid_sample) -
- [1.1. Parameters](#1.1. Parameters)
- [1.2. Returns](#1.2. Returns)
- [2. Grid Sample Native Functions](#2. Grid Sample Native Functions)
-
- [2.1. align_corners (bool, optional)](#2.1. align_corners (bool, optional))
- References
1. torch.nn.functional.grid_sample
torch.nn.functional.grid_sample(input, grid, mode='bilinear', padding_mode='zeros', align_corners=None)
https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html
Compute grid sample.
Given an input and a flow-field grid, computes the output using input values and pixel locations from grid.
使用 input 的值和 grid 中的像素位置计算 output。
Currently, only spatial (4-D) and volumetric (5-D) input are supported.
In the spatial (4-D) case, for input with shape ( N , C , H in , W in ) (N, C, H_\text{in}, W_\text{in}) (N,C,Hin,Win) and grid with shape ( N , H out , W out , 2 ) (N, H_\text{out}, W_\text{out}, 2) (N,Hout,Wout,2), the output will have shape ( N , C , H out , W out ) (N, C, H_\text{out}, W_\text{out}) (N,C,Hout,Wout).
For each output location output[n, :, h, w], the size-2 vector grid[n, h, w] specifies input pixel locations x and y, which are used to interpolate the output value output[n, :, h, w].
对于输出位置 output[n, :, h, w],大小为 2 的向量 grid[n, h, w] 指定了 input 的像素位置 x 和 y,用于插值输出值 output[n, :, h, w]。
In the case of 5D inputs, grid[n, d, h, w] specifies the x, y, z pixel locations for interpolating output[n, :, d, h, w]. mode argument specifies nearest or bilinear interpolation method to sample the input pixels.
对于 5D 输入,grid[n, d, h, w] 指定了插值 output[n, :, d, h, w] 的 x、y、z 像素位置。mode 参数指定了用于采样输入像素的 nearest (最近邻) 或 bilinear (双线性) 插值方法。
grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-1, 1]. For example, values x = -1, y = -1 is the left-top pixel of input, and values x = 1, y = 1 is the right-bottom pixel of input.
grid 指定的采样像素位置是根据 input 的空间维度归一化的。因此,它的大多数值应该在 [-1, 1] 的范围内。例如,值为 x = -1, y = -1 对应 input 的左上像素,值为 x = 1, y = 1 对应 input 的右下像素。
If grid has values outside the range of [-1, 1], the corresponding outputs are handled as defined by padding_mode.
如果 grid 的值超出了 [-1, 1] 的范围,则根据 padding_mode 的定义处理相应的输出。
Options are
padding_mode="zeros": use0for out-of-bound grid locations (对于超出边界的网格位置,使用 0。)padding_mode="border": use border values for out-of-bound grid locations (对于超出边界的网格位置,使用边界值。)padding_mode="reflection": use values at locations reflected by the border for out-of-bound grid locations. For location far away from the border, it will keep being reflected until becoming in bound, e.g., (normalized) pixel locationx = -3.5reflects by border-1and becomesx' = 1.5, then reflects by border1and becomesx'' = -0.5.
对于超出边界的网格位置,使用沿边界反射后的值。对于远离边界的位置,会持续反射直到在边界内,例如归一化像素位置x = -3.5沿边界-1反射得到x' = 1.5,然后沿边界1反射得到x'' = -0.5。
Note
This function is often used in conjunction with
affine_gridto buildSpatial Transformer Networks.
Spatial Transformer Networks
https://arxiv.org/abs/1506.02025
Note
When using the CUDA backend, this operation may induce nondeterministic behaviour in its backward pass that is not easily switched off. Please see the notes on Reproducibility for background.
在使用 CUDA 后端时,此操作的后向传播可能会引起不确定的行为,且不易关闭。
NoteNaN values in
gridwould be interpreted as-1.
grid中的 NaN 值将被解释为-1。
1.1. Parameters
-
input (Tensor): input of shape ( N , C , H in , W in ) (N, C, H_\text{in}, W_\text{in}) (N,C,Hin,Win) (4-D case) or ( N , C , D in , H in , W in ) (N, C, D_\text{in}, H_\text{in}, W_\text{in}) (N,C,Din,Hin,Win) (5-D case)
-
grid (Tensor): flow-field of shape ( N , H out , W out , 2 ) (N, H_\text{out}, W_\text{out}, 2) (N,Hout,Wout,2) (4-D case) or ( N , D out , H out , W out , 3 ) (N, D_\text{out}, H_\text{out}, W_\text{out}, 3) (N,Dout,Hout,Wout,3) (5-D case)
-
mode (str): interpolation mode to calculate output values
'bilinear'|'nearest'|'bicubic'. Default:'bilinear'Note:mode='bicubic'supports only 4-D input. Whenmode='bilinear'and the input is 5-D, the interpolation mode used internally will actually be trilinear. However, when the input is 4-D, the interpolation mode will legitimately be bilinear. -
padding_mode (str): padding mode for outside grid values
'zeros'|'border'|'reflection'. Default:'zeros' -
align_corners (bool, optional): Geometrically, we consider the pixels of the input as squares rather than points.
If set toTrue, the extrema (-1and1) are considered as referring to the center points of the input's corner pixels.
If set toFalse, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.
This option parallels thealign_cornersoption ininterpolate, and so whichever option is used here should also be used there to resize the input image before grid sampling. Default:Falselegitimately [lɪ'dʒɪtɪmətlɪ]
adv. 正当地
geometrically [ˌdʒi:ə'metrɪklɪ]
adv. 用几何学
agnostic [æɡˈnɒstɪk]
adj. 不可知论的
n. 不可知论者
1.2. Returns
- output (Tensor): output Tensor
Warning
When
align_corners = True, the grid positions depend on the pixel size relative to the input image size, and so the locations sampled bygrid_sample()will differ for the same input given at different resolutions (that is, after being upsampled or downsampled). The default behavior up to version 1.2.0 wasalign_corners = True. Since then, the default behavior has been changed toalign_corners = False, in order to bring it in line with the default forinterpolate().
Note
mode='bicubic'is implemented using the cubic convolution algorithm with α = − 0.75 \alpha=-0.75 α=−0.75. The constant α \alpha α might be different from packages to packages. For example, PIL and OpenCV use -0.5 and -0.75 respectively. This algorithm may "overshoot" the range of values it's interpolating. For example, it may produce negative values or values greater than 255 when interpolating input in [0, 255]. Clamp the results withtorch.clamp()to ensure they are within the valid range.
2. Grid Sample Native Functions
aten/src/ATen/native/GridSampler.h
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GridSampler.h
aten/src/ATen/native/GridSampler.cpp
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GridSampler.cpp
2.1. align_corners (bool, optional)
Geometrically, we consider the pixels of the input as squares rather than points.
在几何上,我们将输入像素视为方形而非点。
If set to True, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels.
如果设置为 True,则极限值 (-1 和 1) 被视为指向输入角像素的中心点。
If set to False, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.
如果设置为 False,则它们被视为指向输入角像素的角点,使采样更具分辨率无关性。
This option parallels the align_corners option in interpolate(), and so whichever option is used here should also be used there to resize the input image before grid sampling. Default: False
此选项平行于 interpolate() 中的 align_corners 选项,因此在此处使用的任何选项都应该在调整输入图像大小以进行网格采样时一并使用。默认值:False
Unnormalizes a coordinate from the -1 to +1 scale to its pixel index value, where we view each pixel as an area between (idx - 0.5) and (idx + 0.5).
if align_corners: -1 and +1 get sent to the center points of the input's corner pixels
-1 --> 0
+1 --> (size - 1)
scale_factor = ((size - 1) - 0) / ((+1) - (-1)) = (size - 1) / 2
if not align_corners: -1 and +1 get sent to the corner points of the input's corner pixels
-1 --> -0.5
+1 --> (size - 1) + 0.5 == size - 0.5
scale_factor = ((size - 0.5) - (-0.5)) / ((+1) - (-1)) = size / 2
References
1\] Yongqiang Cheng (程永强),