What is the difference between ConvTranspose2d and Upsample in Pytorch?
To implement UNet in Pytorch based on the model in this paper for the first upsampling layer some people used
self.upSample1 = nn.Upsample(size=(1024, 1024), scale_factor=(2, 2), mode="bilinear")
self.up1 = nn.Sequential(
ConvRelu2d(1024, 512, kernel_size=(3, 3), stride=1, padding=0),
ConvRelu2d(512, 512, kernel_size=(3, 3), stride=1, padding=0)
while some people used
self.up = nn.ConvTranspose2d(in_size, out_size, 2, stride=2)
self.conv = nn.Conv2d(in_size, out_size, kernel_size)
self.conv2 = nn.Conv2d(out_size, out_size, kernel_size)
self.activation = F.relu
I am confused is both Upsample and ConvTranspose2d do the same things?
No they don’t.
ConvTranspose is a convolution and has trainable kernels while
Upsample is a simple interpolation (bilinear, nearest etc.)
@justusschock you mean ConvTranspose2d is deconvolution?
Yes if you want to call it like that.
It is not really an inverse operation though, but the same operation with another kernel.
Transpose is learning parameter while Up-sampling is no-learning parameters. Using Up-samling for faster inference or training because it does not require to update weight or compute gradient
So according to you in the Unet paper what did the author used for Upsampling, nn.Upsample or ConvTranspose2d?
and can you also explain which one to use among above 2 in what condition
In the Unet paper, they’ve stated that “up-convolutions” were used. I would assume that this means they would have implemented/used
As for which one to use, it really depends on the network you’re designing. If you are sure of the kind of upsampling that needs to be done (bilinear, etc.) then you can use
nn.Upsample. However, if you think it would be important to “learn” how to upsample instead of just using a hardcoded method then the trainable parametes in
ConvTranspose2d would be useful.
Actually, in the U-net paper, they specify what they mean by up-convolutions:
Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”)
Some of the recent architectures, like BigGAN, use this scheme to avoid the cherckerboard artifacts.