Mat mul Shape error with Flatten layer when using gradcam

Hi, I am newbie to captum and I have a problem that I can’t solve by myself.

I trained ACGAN model and tried to apply gradcam on Discriminator part only.

Input size is (1, 110, 408)

This is my Discriminator / Critic part.

class Discriminator(nn.Module):
    def __init__(self, num_class):
        super(Discriminator, self).__init__()


        self.main = nn.Sequential(
            # input is 1 x 110 x 408
            nn.Conv2d(1, ndf, (4,6), (2,3), bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 54 x 135

            nn.Conv2d(ndf, ndf * 2, (8,9), (2,3), bias=False),
            nn.InstanceNorm2d(ndf * 2, affine= True),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 24 x 43

            nn.Conv2d(ndf * 2, ndf * 4, (6,3), (2,2), bias=False),
            nn.InstanceNorm2d(ndf * 4, affine= True),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 10 x 21

            nn.Conv2d(ndf * 4, ndf * 8, (4,6), (2,3), bias=False),
            nn.InstanceNorm2d(ndf * 8, affine= True),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 6
        )

        self.critic_layer = nn.Sequential(nn.Conv2d(ndf * 8, 1, (4,6), 1, bias=False))
        
        self.aux_layer = nn.Sequential(nn.Flatten(), nn.Linear(ndf * 8 * 4 * 6, num_class))
        

    def forward(self, input):
        input1 = self.main(input)
        critic = self.critic_layer(input1)
        pred_label = self.aux_layer(input1)
        return critic, pred_label
Discriminator(
  (main): Sequential(
    (0): Conv2d(1, 64, kernel_size=(4, 6), stride=(2, 3), bias=False)
    (1): LeakyReLU(negative_slope=0.2, inplace=True)
    (2): Conv2d(64, 128, kernel_size=(8, 9), stride=(2, 3), bias=False)
    (3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
    (4): LeakyReLU(negative_slope=0.2, inplace=True)
    (5): Conv2d(128, 256, kernel_size=(6, 3), stride=(2, 2), bias=False)
    (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
    (7): LeakyReLU(negative_slope=0.2, inplace=True)
    (8): Conv2d(256, 512, kernel_size=(4, 6), stride=(2, 3), bias=False)
    (9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
    (10): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (critic_layer): Sequential(
    (0): Conv2d(512, 1, kernel_size=(4, 6), stride=(1, 1), bias=False)
  )
  (aux_layer): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=12288, out_features=129, bias=True)
  )
)

And I followed the guided gradcam example as this doc

guided_gc = GuidedGradCam(netD_i, netD_i.main[8])
input3 = input_tensor[0]
input3.requires_grad = True
print(input3)
tensor([[[-0.9815, -0.9815, -0.9898,  ..., -0.9708, -0.9451, -0.9735],
         [-0.9648, -0.9803, -0.9869,  ..., -0.9776, -0.9829, -0.9773],
         [-0.9951, -0.9984, -0.9986,  ..., -0.9944, -0.9847, -0.9956],
         ...,
         [-0.9856, -0.9924, -0.9898,  ..., -0.9958, -0.9944, -0.9928],
         [-0.9368, -0.9677, -0.9770,  ..., -0.9674, -0.9666, -0.9691],
         [-0.9358, -0.9468, -0.9367,  ..., -0.9777, -0.9739, -0.9577]]],
       device='cuda:0', requires_grad=True)

traget is like this

label_y[0]
tensor(29., device='cuda:0')

When I run this code, I get the error.

guided_gc.attribute(input3, label_y[0])
RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x24 and 12288x129)

FYI, 512 x 24 = 12288.

I assume that the cause of this error is for the Flatten layer in aux_layer.
But it didn’t cause any issue when I trained the model.

How can I fix this problem??
Thx! :+1:

I didn’t read the doc carefully :sweat_smile:

In the API doc,
inputs (Tensor or tuple[Tensor, ...]) – Input for which saliency is computed. If forward_func takes a single tensor as input, a single input tensor should be provided. If forward_func takes multiple tensors as input, a tuple of the input tensors should be provided. It is assumed that for all given input tensors, dimension 0 corresponds to the number of examples (aka batch size), and if multiple input tensors are provided, the examples must be aligned appropriately.

So, I have should make my input’s first dim as batch size.
Just add unsqueeze(0) at the end of the input if you test only one input. :crazy_face: