Hi - I was experimenting with ConvTranspose2d operation. I took a [2 x 2] random tensor and applied transpose conv on it with and without padding. Both the kernel size and stride are set to 2. When I checked the size of the tensor after the operation I found that the size of the output tensor without padding is bigger than with padding.
Code snippet with padding:
d=torch.randn(1,1,2,2)
deconv2=nn.ConvTranspose2d(1,1,kernel_size=2,stride=2,padding=1)
d1=deconv2(d)
d1.shape ==>> it returns torch.Size([1, 1, 2, 2])
Code snippet without padding:
deconv2=nn.ConvTranspose2d(1,1,kernel_size=2,stride=2,padding=0)
d1=deconv2(d)
d1.shape ==>> it returns torch.Size([1, 1, 4, 4])
I thought that the output tensor size with padding will be bigger than without padding.But it’s not the case. Can anyone please explain this?
The padding argument effectively adds dilation * (kernel_size - 1) - padding amount of zero padding to both sizes of the input. This is set so that when a Conv2d and a ConvTranspose2d are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when stride > 1 , Conv2d maps multiple input shapes to the same output shape. output_padding is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that output_padding is only used to find output shape, but does not actually add zero-padding to output.
Hi @ptrblck - Thanks for sharing the “conv arithmmetic” tutorial. I checked this tutorial before as well. I am still confused on padding. Consider the first case when I used padding=1. Note I didn’t use dilation. So, according to the PyTorch document (“dilation * (kernel_size - 1) - padding”) which equals to (0*(2-1)-1) = -1 amount of padding will be added. This does not make any sense to me. Am I doing any mistake?
@ptrblck - I also observe that when the stride is > 1 (say 2) the transpose Conv can’t reconstruct the original image size. But if I use unit stride then transpose Conv reconstructs the exact image size. See below:
Code snippet for perfect reconstruction:
In [1]: import torch
In [2]: D=torch.randn(1,1,28,28)
In [3]: import torch.nn as nn
In [4]: s,k,p=1,5,0
In [5]: conv1 = nn.Conv2d(1,1,kernel_size=k,stride=s,padding=p)
In [6]: deconv1 = nn.ConvTranspose2d(1,1,kernel_size=k,stride=s,padding=p)
In [7]: D1=conv1(D)
In [8]: D1_t = deconv1(D1)
In [10]: D1_t.shape
Out[10]: torch.Size([1, 1, 28, 28])==> size of the reconstructed image
Code snippet when reconstruction is not perfect when I use stride=2.
In [11]: s,k,p=2,5,0
In [12]: conv1 = nn.Conv2d(1,1,kernel_size=k,stride=s,padding=p)
In [13]: deconv1 = nn.ConvTranspose2d(1,1,kernel_size=k,stride=s,padding=p)
In [14]: D1 = conv1(D)
In [15]: D1_t = deconv1(D1)
In [16]: D.shape
Out[16]: torch.Size([1, 1, 28, 28])
In [17]: D1_t.shape
Out[17]: torch.Size([1, 1, 27, 27]) ==> size of the reconstructed image
Hi
I want to use how the transpose convolution implemented in general for Generative Adversarial Networks using PyTorch framework. For example DCGAN Tutorial — PyTorch Tutorials 1.11.0+cu102 documentation the code taken from here. Is transpose convolution a combination of upsampling layer and convolution layer used or any other approach I really appreciate your help.
The Generator uses nn.ConvTranspose2d layers so directly transposed convolutions (you can think about them as the “reversed” conv layers i.e. the forward pass of a transposed conv equals the backward of a vanialla conv layer and vice versa). For more information about these layers, check this repo.
Thank you very much for your quick response. So there won’t be any upsampling layer before applying convolution for this implementation? I thought transpose convolution = upsample layer+convolution layer from the paper https://arxiv.org/pdf/1806.01107.pdf fig 4.
I’m not familiar with this paper and don’t know if the authors call an upsampling + conv block a “transposed convolution”. You can see in the model architecture of the tutorial that nn.ConvTranspose2d is directly used and please check the posed link to see how this layer is working and increasing the spatial size of the input activation.
Hi,
Thanks.I’m working on optimization of transpose convolution layer by avoiding the increasing input spatial size before applying convolution and to produce the same output. May I please know how the backend code works?