Is there a network layer which requires a 3D input and gives a 2D output?

Hi all,
I am setting up a workflow for a training with DenseNet-121. My data is too small in the depth direction and therefore I have to adjust my network. My input is (288 ×288 ×16) so I want to change the 3D average pooling layer (2x2x2) in my 3rd transition layer to a 2D average pooling layer (2x2).

This is the code I am aiming for in the 3rd transition layer:
(transition3): _Transition(
(norm): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv): Conv3d(1024, 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(pool): AvgPool2d(kernel_size=2, stride=2, padding=0)

The problem I am facing is that the input of the AvgPool2D is (18×18×1×512) which is a 3D input, and the AvgPool2D is expecting a 2D input. So I want to ‘squeeze’ (18×18×1×512) to (18×18×512) before using it as input for the pooling layer. However, I do not know how to implement this squeeze function into my transition layer. Does anyone has suggestions on how to fix this problem?

Thanks in advance!

You could write a custom nn.Module and apply the squeeze operation there.
Something like this should work:

class Squeeze(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dim = dim
        
    def forward(self, x):
        x = x.squeeze(dim=self.dim)
        return x

squeeze = Squeeze(dim=2)
x = torch.randn(18, 18, 1, 512)
out = squeeze(x)
print(out.shape)
> torch.Size([18, 18, 512])

How do we figure out if backpropagation can work with such a layer, if we insert it into our network? In this particular, case, the documentation for torch.squeeze() doesn’t say whether PyTorch knows how to compute gradients for this operation. Is there a master list of PyTorch functions for which PyTorch knows how to compute gradients?

More generally, how can we ascertain if a custom nn.Module plays well with autograd?

You could run a quick test and check, if the gradients are calculated:

squeeze = Squeeze(dim=2)
x = torch.randn(18, 18, 1, 512, requires_grad=True)
out = squeeze(x)
out.backward(torch.ones_like(out))
print(x.grad)
tensor([[[[1., 1., 1.,  ..., 1., 1., 1.]],

         [[1., 1., 1.,  ..., 1., 1., 1.]],

         [[1., 1., 1.,  ..., 1., 1., 1.]],

         ...,
1 Like

Thank you.

Just to be clear, this output means that PyTorch does not know how to compute gradients for the Squeeze module, right? Because x.grad remained at its initial all-ones state? Or am I reading this wrong?

Also, what would be a good resource for getting a fairly complete picture of PyTorch autograd? The current picture I have in my head is a mish-mash of graph, leaves, detach, … which is hit-or-miss when it comes to slightly non-trivial scenarios.

No, Autograd knows how to propagate the gradients through squeeze. The ones are expected or which values would you expect when squeezing/unsqueezing a dimension with the size of 1?

This tutorial might be a good starter.

Thank you.

What would have been the outcome if Autograd did not know how to do this? An exception/error message saying something about gradients not being computable?

I cannot figure this out: what is the derivative of with respect to merely rearranging data? I am not sure if there is a way to define this. But perhaps since there is no change in the values we can take this derivative to be zero? Is this what Autograd is doing here? It is adding the zeros to the original ones, so the ones stay where they are?

(Please don’t bother answering this question if it is too silly.)

I have skimmed this once or twice. I guess it is time to bite the bullet and go through it more carefully!

Yes, Autograd would yield an error, if the backward method is not available.
This would happen e.g. if you are using a non-differentiable operation:

x = torch.randn(10, 10, requires_grad=True)
out = torch.argmax(x, dim=1)
out.backward(torch.ones_like(out))
> RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

No, squeeze and unsqueeze do not change any values and the number of elements of the tensor stays the same so the gradient is just passed to the tensor.

1 Like

Thanks for the reply. This worked!