Understanding Transposed Convolution

One authoritative way to look at transposed convolutions is to look at how convolutions operate as banded matrices on flattened images. Then the transposed convolution is just applying the transposed matrix to something of the output shape. For example, Dumoulin and Visin do this in their famous explanation.

The other thing you can do is to recall that the transposed convolutions are there to provide the adjoint operation of convolution for computing the derivative.
The adjoint of summation is expansion, so you re-use the same input value of the transposed convolution corresponding to a given output of the forward convolution for all output values of the transposed convolution that correspond to the input values of the forward convolution that are summed to form that output.
The adjoint of expansion, i.e. using the same input value several times, is summation. So if a given pixel is used several times, the adjoint will sum over the corresponding forward output locations.
In between you need to multiply with the corresponding weight in the summation stencil

In your case: By the stride and the padding, the top left input pixel to the forward convolution is only used in the top left output and multiplied with the (1, 1) weight element. Similarly, the (0,1) entry is 12=5 + 7 as that pixel would show up in two locations (the top left and the pixel right to it because the stride has it appear twice). The (1, 0) pixel on the other hand is 12=2+10 by the same logic.

Best regards

Thomas

1 Like