torch.nn.ConvTranspose2d vs torch.nn.Upsample

What is the difference between ConvTranspose2d and Upsample in Pytorch?
To implement UNet in Pytorch based on the model in this paper for the first upsampling layer some people used

self.upSample1 = nn.Upsample(size=(1024, 1024), scale_factor=(2, 2), mode="bilinear")

        self.up1 = nn.Sequential(
            ConvRelu2d(1024, 512, kernel_size=(3, 3), stride=1, padding=0),
            ConvRelu2d(512, 512, kernel_size=(3, 3), stride=1, padding=0)

while some people used

        self.up = nn.ConvTranspose2d(in_size, out_size, 2, stride=2)
        self.conv = nn.Conv2d(in_size, out_size, kernel_size)
        self.conv2 = nn.Conv2d(out_size, out_size, kernel_size)
        self.activation = F.relu

I am confused is both Upsample and ConvTranspose2d do the same things?


No they don’t.
ConvTranspose is a convolution and has trainable kernels while Upsample is a simple interpolation (bilinear, nearest etc.)


@justusschock you mean ConvTranspose2d is deconvolution?

Yes if you want to call it like that.
It is not really an inverse operation though, but the same operation with another kernel.


Transpose is learning parameter while Up-sampling is no-learning parameters. Using Up-samling for faster inference or training because it does not require to update weight or compute gradient


So according to you in the Unet paper what did the author used for Upsampling, nn.Upsample or ConvTranspose2d?
and can you also explain which one to use among above 2 in what condition

In the Unet paper, they’ve stated that “up-convolutions” were used. I would assume that this means they would have implemented/used ConvTranspose2d.

As for which one to use, it really depends on the network you’re designing. If you are sure of the kind of upsampling that needs to be done (bilinear, etc.) then you can use nn.Upsample. However, if you think it would be important to “learn” how to upsample instead of just using a hardcoded method then the trainable parametes in ConvTranspose2d would be useful.


Actually, in the U-net paper, they specify what they mean by up-convolutions:

Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”)

Some of the recent architectures, like BigGAN, use this scheme to avoid the cherckerboard artifacts.