ConvTranspose2D much slower than Conv2d

Alexander_Koch · January 26, 2021, 2:39pm

I am trying to figure out why my convolutional decoder is much slower than the encoder.
Here are the architectures:

ConvEncoder(
(model): Sequential(
(0): Conv2d(3, 32, kernel_size=(4, 4), stride=(2, 2))
(1): ReLU()
(2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
(3): ReLU()
(4): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2))
(5): ReLU()
(6): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2))
(7): ReLU()
)
)
ConvDecoder(
(model): Sequential(
(0): Linear(in_features=230, out_features=1024, bias=True)
(1): Reshape()
(2): ConvTranspose2d(1024, 128, kernel_size=(5, 5), stride=(2, 2))
(3): ReLU()
(4): ConvTranspose2d(128, 64, kernel_size=(5, 5), stride=(2, 2))
(5): ReLU()
(6): ConvTranspose2d(64, 32, kernel_size=(6, 6), stride=(2, 2))
(7): ReLU()
(8): ConvTranspose2d(32, 3, kernel_size=(6, 6), stride=(2, 2))
)
)

The batch shape is (50, 50, 3, 64, 64).
The encoder takes about 16ms and the decoder 130ms on a GTX 1080TI.

I am using cuda.synchronize() for the timings.
What is the reason for these very different times?

JuanFMontesinos · January 27, 2021, 1:17am

There are many reasons.

The amount of filters of the decoder is bigger.
The kernel size of the decoder is bigger.
Probable the transposed convolution itself is slower. I guess this as you need to allocate more memory (you are upsampling) and to work with sparse kernels.