Understanding Conv2d and ConvTranspose2d

xwkuang5 · April 10, 2019, 3:37am

Hi, I am trying to use ConvTranspose2d to reverse the operation performed by Conv2d by using the weight in Conv2d to initialize ConvTranspose2d.

I am reading A guide to convolution arithmetic for deep
learning and came up with the following code to test my hypothesis about Conv2d and ConvTranspose2d. I thought that my code below should allow me to apply downsampling to an image and then perform upsampling to get the original image back. However, that does not seem to be the case. Can anyone share some insights about understanding Conv2d and ConvTranspose2d?

import torch
from torch.nn import Conv2d, ConvTranspose2d

img = torch.rand(1, 1, 3, 3, requires_grad=False)
downsample = Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=0, bias=False)
upsample = ConvTranspose2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=0, bias=False)

# should I use .t() ? 
upsample.weight.data = downsample.weight.data

out = downsample(img)
inv_img = upsample(out)

torch.allclose(img, inv_img)

thib-s · January 6, 2024, 10:51am

Answering this old thread for the sake of completeness:

Your kernel needs extra properties to have this true. Let’s split this answer into two parts: one about the conv transpose and one about computing the inverse of a linear application.

Convolutions can be written as a specific fully connected layer with sparse, repeated entries (let’s call it W). Conv transpose computes efficiently the transpose of W. This is not always equivalent to transposing the kernel since there is padding, stride, and dilation.

Then having convtranspose(conv(x)) = x requires you to have W*Wt = identity which is not the case in general. This would require Wt = W^-1, but W^-1 only exists for square matrices (so convolutions with as many inputs as outputs, otherwise you can fall back on the Moore-Penrose pseudo inverse). Again, computing the kernel inverse K^-1 is not equivalent to computing W^-1. Finally, there are no guarantees that W^-1 will still have the structure of a convolution (with the sparsity and repeated entries).

Finally, this does not mean that there is no kernel such that convtranspose(conv(x)) = x. Just that finding such a kernel is very non-trivial. One quick and cheap way you could try would be to add a regularization loss that would enforce this.