ConvTranspose2D much slower than Conv2d

I am trying to figure out why my convolutional decoder is much slower than the encoder.
Here are the architectures:

(model): Sequential(
(0): Conv2d(3, 32, kernel_size=(4, 4), stride=(2, 2))
(1): ReLU()
(2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
(3): ReLU()
(4): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2))
(5): ReLU()
(6): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2))
(7): ReLU()
(model): Sequential(
(0): Linear(in_features=230, out_features=1024, bias=True)
(1): Reshape()
(2): ConvTranspose2d(1024, 128, kernel_size=(5, 5), stride=(2, 2))
(3): ReLU()
(4): ConvTranspose2d(128, 64, kernel_size=(5, 5), stride=(2, 2))
(5): ReLU()
(6): ConvTranspose2d(64, 32, kernel_size=(6, 6), stride=(2, 2))
(7): ReLU()
(8): ConvTranspose2d(32, 3, kernel_size=(6, 6), stride=(2, 2))

The batch shape is (50, 50, 3, 64, 64).
The encoder takes about 16ms and the decoder 130ms on a GTX 1080TI.

I am using cuda.synchronize() for the timings.
What is the reason for these very different times?

There are many reasons.

  1. The amount of filters of the decoder is bigger.
  2. The kernel size of the decoder is bigger.
  3. Probable the transposed convolution itself is slower. I guess this as you need to allocate more memory (you are upsampling) and to work with sparse kernels.