Hello,

I have also been needing to do this exact thing myself (for various reasons) for a little while and struggling to figure it out. Honestly, itâ€™s way harder to find answers for this than it should be. I have read a lot of material on convolution, pytorch unfold, convtranspose2d, and cnn gradients. Finally, I just got it.

The answer is fortunately actually very simple, itâ€™s just that it seems everyone has a different view of this operation (upsampling, cnn gradient, deconvolution, etc) that doesnâ€™t quite explain everything. So this answer isnâ€™t very â€śgoogleableâ€ť. Note this is the inefficient way of doing things - to do this using unfold, we have to add a bunch of padding to all the sides. There are more efficient implementations, but this is the best vectorized implementation I can come up with.

```
import torch
import torch.nn.functional as F
img = torch.randn(1 ,50 ,28 ,28)
kernel = torch.randn(30,50 ,3 ,3)
true_convt2d = F.conv_transpose2d(img, kernel.transpose(0,1))
pad0 = 3-1 # to explicitly show calculation of convtranspose2d padding
pad1 = 3-1
inp_unf = torch.nn.functional.unfold(img, (3,3), padding=(pad0,pad1))
w = torch.rot90(kernel, 2, [2,3])
# this is done the same way as forward convolution
out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
out = out_unf.view(true_convt2d.shape)
print((true_convt2d-out).abs().max())
print(true_convt2d.abs().max())
```

Running this code will give a small error value around 1e-5, but you can see the magnitude of the output is around 90 so itâ€™s close enough. I think this is due to optimization in the backend.

I hope this answer becomes â€śgoogleableâ€ť for others looking for this information. I am an new user so I can apparently only put 2 links in a post. If you want more, please DM me. There is a formula for calculating padding on data science stack exchange (though itâ€™s not too hard to figure out) if you search â€śhow-to-calculate-the-output-shape-of-conv2d-transposeâ€ť.

Disclaimer: I am pretty sure this is correct, but it still could be wrong. Also, this doesnâ€™t take into account padding, strides, dilation, or groups.

Sources:

Thorough descriptions of convolution

Visualization that explains rotation