I’m currently reading paper Visualizing and Understanding Convolutional Networks, which is known as ZFnet.
In this paper, deconvolution is done by simply transposing the original filter used by convolution layer, then do the convolution with that.

Some people implemented above paper with pytorch
nn.ConvTranspose2d
function, but is it the right function? In pytorch documentation, it says that
This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionallystrided convolution or a deconvolution (although it is not an actual deconvolution operation).
If it’s the right choice, then is the gradient of Conv2d is same with the Convolution with transposed filter?

Is there any suggested paper to understand ConvTranspose2d module? I could understand what this function does with this animation, but I don’t understand fully why do they use this module for (pseudo) inverse of convolution, especially the weight sharing part. It seems it’s not mathematical inverse function of convolution, then there should be some reasons for choosing this module. Is it just chosen as heuristic? Or is there any other reason for this?

In another version of same paper, author stated
The convnet uses learned filters to convolve the feature maps from
the previous layer. To approximately invert this, the deconvnet uses transposed
versions of the same filters (as other autoencoder models, such as RBMs), but
applied to the rectified maps, not the output of the layer beneath. In practice
this means flipping each filter vertically and horizontally.
Does RBM use this kind of technique(transpose) in its operation?