MmBackward returned an invalid gradient at index 0?

I’m collecting two different measures of loss and propagating both of them to a single network. During the loss.backward() to the generator, I get this error about an invalid gradient at index 0.
Here is the code block:

for epoch in range(num_epochs):
    for batch_idx, (real, labels) in enumerate(loader):
        #get a fixed input batch to display gen output
        if batch_idx == 0:
            if epoch == 0:
                fixed_input = real.view(-1,784).to(device)
        
        adv_ex = real.clone().reshape(-1,784).to(device) # [32, 784] advex copy of first batch flattened
        real = real.view(-1, 784).to(device) # [32, 784] # real batch flattened
        labels = labels.to(device) # size() [32] 32 labels in batch
        
        
        #purturb each image in adv_ex
        tmp_adv_ex = []
        for idx, item in enumerate(adv_ex):
            purturbation = gen(adv_ex[idx])
            tmp_adv_ex.append(adv_ex[idx] + purturbation)
        adv_ex = torch.cat(tmp_adv_ex, dim=0)
        
        
         
        ### Train Generator: min log(1 - D(G(z))) <-> max log(D(G(z))
        output = disc(adv_ex).view(-1)
        lossG = torch.mean(torch.log(1. - output)) #get loss for gen's desired desc pred

        adv_ex = adv_ex.reshape(-1,1,28,28)
        f_pred = target(adv_ex)
        f_loss = CE_loss(f_pred, labels) #add loss for gens desired f pred
        loss_G_Final = f_loss+lossG 

        opt_gen.zero_grad()
        loss_G_Final.backward() #THIS IS THE ERROR SOURCE
        opt_gen.step()
        
        ### Train Discriminator: max log(D(x)) + log(1 - D(G(z)))
        
        adv_ex = adv_ex.reshape(32, 784)
        disc_real = disc(real).view(-1)
        disc_fake = disc(adv_ex).view(-1)
        lossD = -torch.mean(torch.log(disc(real)) + torch.log(1. - disc(adv_ex)))
        
        opt_disc.zero_grad()
        lossD.backward(retain_graph=True)
        opt_disc.step()

and this is the exact error code:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_31570/4032279268.py in <module>
     30 
     31         opt_gen.zero_grad()
---> 32         loss_G_Final.backward()
     33         opt_gen.step()
     34 

~/.conda/envs/mypytorch19/lib/python3.9/site-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253                 create_graph=create_graph,
    254                 inputs=inputs)
--> 255         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256 
    257     def register_hook(self, hook):

~/.conda/envs/mypytorch19/lib/python3.9/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    145         retain_graph = create_graph
    146 
--> 147     Variable._execution_engine.run_backward(
    148         tensors, grad_tensors_, retain_graph, create_graph, inputs,
    149         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: Function MmBackward returned an invalid gradient at index 0 - got [1, 784] but expected shape compatible with [1, 25088]

How can I interpret and solve this error?
Thanks!

1 Like

As described in the related topic. If you are not using the latest 1.9.1 release, please update to it as a broken shape check in matmul was recently fixed.

updated to the latest version and this is the new error code:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_16674/3188434085.py in <module>
     21 
     22         # Train Generator: min log(1 - D(G(z))) <-> max log(D(G(z))
---> 23         output = disc(adv_ex).view(-1) #discriminator decides if advex is real or fake
     24         lossG = torch.mean(torch.log(1. - output)) #get loss for gen's desired desc pred
     25 

~/.conda/envs/mypytorch19/lib/python3.9/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/AdvGan9-4/models.py in forward(self, x)
     28 
     29     def forward(self, x):
---> 30         x = self.fc1(x)
     31         x = nn.ReLU()(x)
     32         x = self.fc2(x)

~/.conda/envs/mypytorch19/lib/python3.9/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/.conda/envs/mypytorch19/lib/python3.9/site-packages/torch/nn/modules/linear.py in forward(self, input)
     94 
     95     def forward(self, input: Tensor) -> Tensor:
---> 96         return F.linear(input, self.weight, self.bias)
     97 
     98     def extra_repr(self) -> str:

~/.conda/envs/mypytorch19/lib/python3.9/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1845     if has_torch_function_variadic(input, weight):
   1846         return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1847     return torch._C._nn.linear(input, weight, bias)
   1848 
   1849 

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x25088 and 784x128)

does this mean i need to change the input dimensions?

OK, great! That’s how the error should have been reported in the first place.
As the error claims, you are running into a shape mismatch in a linear layer, so print the shape of the intermediate activations in the forward method and make sure the in_feautres of the crashing linear layer match the activation features.

It looks like the shape mismatch is on fc2 where it’s trying to put a 32x784 into a 28 x 1.
How do I make this discriminator take a batch of 32 28x28 images and output 32 scores of size 1 corresponding to each input image?

edit: figured it out, before i wasn’t passing a reshaped version of adv_ex [32,28 *28] through the network. Now it outputs 32x1.