I’m having a hard time visualizing how nn.functional.pad works. The docs about pad say the following:
For example, to pad only the last dimension of the input tensor, then
pad
has the form (padding_left, padding_right); to pad the last 2 dimensions of the input tensor, then use (padding_left, padding_right, padding_top, padding_bottom) ; to pad the last 3 dimensions, use (padding_left, padding_right, padding_top, padding_bottom, padding_front, padding_back) .
But I can’t understand how (padding_left, padding_right, padding_top, padding_bottom) is the same as padding the last 2 dimensions or how (padding_left, padding_right, padding_top, padding_bottom, padding_front, padding_back) translates to padding the last three dimensions. And what “words” does one use continuing in the same fashion for padding the last 4 dimensions? (…, padding_front, padding_back, padding_???, padding_???)
However my main request is to ask for an intuitive explanation how pad works. A visualization would be nice, but any intuition that you can impart about pad would be nice as well!
FYI, my doubts about pad came up after reading torch’s implementation of local_response_norm
def local_response_norm(input, size, alpha=1e-4, beta=0.75, k=1.):
# type: (Tensor, int, float, float, float) -> Tensor
r"""Applies local response normalization over an input signal composed of
several input planes, where channels occupy the second dimension.
Applies normalization across channels.
See :class:`~torch.nn.LocalResponseNorm` for details.
"""
if not torch.jit.is_scripting():
if type(input) is not Tensor and has_torch_function((input,)):
return handle_torch_function(
local_response_norm, (input,), input, size, alpha=alpha, beta=beta, k=k)
dim = input.dim()
if dim < 3:
raise ValueError('Expected 3D or higher dimensionality \
input (got {} dimensions)'.format(dim))
div = input.mul(input).unsqueeze(1)
if dim == 3:
div = pad(div, (0, 0, size // 2, (size - 1) // 2))
div = avg_pool2d(div, (size, 1), stride=1).squeeze(1)
else:
sizes = input.size()
div = div.view(sizes[0], 1, sizes[1], sizes[2], -1)
div = pad(div, (0, 0, 0, 0, size // 2, (size - 1) // 2))
div = avg_pool3d(div, (size, 1, 1), stride=1).squeeze(1)
div = div.view(sizes)
div = div.mul(alpha).add(k).pow(beta)
return input / div
If anyone could explain why padding is done in this way for the above function, I would be very grateful! (Apologies for the loaded question, I’m a noob)