Understand “Visualizing and Understanding Convolutional Networks”

I’m trying to understand “Visualizing and Understanding Convolutional Networks” https://arxiv.org/pdf/1311.2901.pdf

The paper states: the deconvnet uses transposed versions of the same filters, applied to the rectified maps

Is it possible to implement this step in a short example in pytorch? Given an unpooled, rectified map; how would the transposed filter be applied against it?

Thanks

I think this statement points towards reusing (trained) filters from convolutions in transposed convolutions at a later stage in the model:

conv = nn.Conv2d(3, 6, 3, 1, 1, bias=False)
x = torch.randn(1, 3, 24, 24)
out = conv(x)
out = F.conv_transpose2d(out, conv.weight, stride=1, padding=1)
print(out.shape)
> torch.Size([1, 3, 24, 24])

Right; so seems pretty dead simple :slight_smile: - but I might still be getting myself confused.

The paper says:
the deconvnet uses transposed versions of the same filters, but applied to the rectified maps, not the output of the layer beneath. In practice this means flipping each filter vertically and horizontally.

“the same filters” refer to the learnt filters from training.

The paper refers to the flipped vertically and horizontally filter as transposed ... filters
Is it correct that the term transposed filters has no semantic relation to the term transposed convolution?

Appreciated this is not pytorch related; so completely understand if its too far off topic for this forum; but any easy to share pointers would be fab!!

The paper says:
In the deconvnet, the unpooling operation uses these switches to place the reconstructions from the layer above into appropriate locations,
So if the “layer above” had an output of

[
[1,2],
[3,4]
]

And if; on the forward pass; before the pooling operation the layer below shape was 4x4.

Then instead of using “stride” to introduce spaces between the activations - would the process in the paper just place the activations in the original locations recorded on the forward pass; so for example could end up with something like

[
[1,0,0,0],
[0,0,2,0],
[0,3,0,4],
[0,0,0,0]
]

And then the convolution operation would actually be a normal convolution as the resolution of the features has already been increased; and would convolve the transposed filters across the above feature map?

Oh, you might be right and it seems the authors might indeed meant to flip the kernel and apply a “standard” convolution?
The transposed convolution is nicely explained here. In particular check chapter 4.2.
Would it fit the paper better, if nn.Conv2d with a flipped kernel would be used?

Great; thanks for the referenced explanation; I’ll look into it; and try to explore things further.

Really appreciate you input on something so non-pytorch!!