MaxPool1d input gradient shape different from input


(Gabriel Tseng) #1

Hello,

I’m having some trouble with a backward hook on the MaxPool1d module. From my understanding, grad_input should have the same shape as the input to the module. This seems to be the case for MaxPool2d:

>>> import torch
>>> from torch import nn
>>> test_2d_input = torch.ones(40, 10, 10, 13, requires_grad=True)
>>> test_2d_output = torch.ones(40, 10, 5, 6, requires_grad=True)
>>> maxpool2d = nn.MaxPool2d(kernel_size=2)
>>> def print_grads(module, grad_input, grad_output):
...     print(module)
...     print([g.shape for g in grad_input])
...     print([g.shape for g in grad_output])
...
>>> maxpool2d.register_backward_hook(print_grads)
<torch.utils.hooks.RemovableHandle object at 0x109059860>
>>> o_2d = maxpool2d(test_2d_input)
>>> o_2d.backward(test_2d_output)
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
[torch.Size([40, 10, 10, 13])]
[torch.Size([40, 10, 5, 6])]

However, for MaxPool1d, the shape of the input ([40, 1, 13] in this example) is different from the shape of grad_input ([40, 1, 1, 6]):

>>> test_1d_input = torch.ones(40, 1, 13, requires_grad=True)
>>> test_1d_output = torch.ones(40, 1, 6, requires_grad=True)
>>> maxpool1d = nn.MaxPool1d(kernel_size=2)
>>> def print_grads(module, grad_input, grad_output):
...     print(module)
...     print([g.shape for g in grad_input])
...     print([g.shape for g in grad_output])
...
>>> maxpool1d.register_backward_hook(print_grads)
<torch.utils.hooks.RemovableHandle at 0x11c505ef0>
>>> o_1d = maxpool1d(test_1d_input)
>>> o_1d.backward(test_1d_output)
MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
[torch.Size([40, 1, 1, 6])]
[torch.Size([40, 1, 6])]

How is this grad_input then manipulated to match the actual size of the input tensor (and why does this happen)?

Thank you!

Gabi


(Gabriel Tseng) #2

I’ve worked out that for MaxPool1d, grad_input matches the elements of the tensor which are output by the module (i.e. all it is all the non zero gradients of the input).

My follow up question is whether there is a way to manipulate the gradients of the other elements (i.e. the values of grad_in which should be 0, but instead aren’t in grad_in) using the backward hook?


#3

Hi @gabrieltseng ,

It is an interesting idea!
Have you got a proper way to manipulate the gradients which is 0 after MaxPool1d?


(Gabriel Tseng) #4

Hi! Yes; I registered a backward hook on the input tensor to the maxpool1d, and applied the gradient there:

https://pytorch.org/docs/stable/autograd.html#torch.Tensor.register_hook