Hi,
I’m implementing the GradCam algorithm on several architectures, mainly Resnets. The main issue is that the feature maps are very small in the last block, precisely 1x1.
In particular, giving a batch of 64x3x32x32 (CIFAR10) to a Resnet18,
the feature maps after the layer4.conv2 are [1, 512, 1, 1]
Therefore the cams are of size ([64, 1, 1, 1]), which are very small (and not informative at all!)
Is there a simple way to improve this situation? I’m considering interpolating CIFAR10 to 128x128 before training, to have at least more informative CAMs 4x4.
Do you have any advice?
I really appreciate any help you can provide.
G