Model Backward(): RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

tsuijenk · December 22, 2021, 12:54am

Hello,

I’ve been struggling for a while with the runtime error. I have visited most, if not all, of the posts on this forum that has this error, and tried to replicate the solution, but I still cannot fix it. Any suggestions would be appreciated!

Essentially, I have a custom PyTorch dataset class, where I set requires_grad=False for both input x and groundtruth y. I read that this is a must in order to iterate through the dataloader, and this seems to work.

But I also read on this forum that I have to set input x with requires_grad=True before passing into the network, since I had previously set it to False.

But as for groundtruth, despite we use it in the loss function, do we still need to set requires_grad=True for that? I’ve tried that and still get the same runtime error.

Essentially, my code runs past the validation sanity check. It was able to generate the losses twice at Epoch 0, before running into an error.

For all the losses generated at Epoch 0, I see that there’s grad_fn being recorded. So I really don’t understand where there is an error.

Epoch 0:   0%|                                                                                                     | 0/16 [00:00<00:00, 5011.12it/s]

{'loss1': tensor(1.0887e+15), 'loss2': tensor(11.4520, grad_fn=<DivBackward0>), 'loss3': tensor(2.0911, grad_fn=<LogBackward0>)}
tensor(1.0887e+15, grad_fn=<AddBackward0>)

{'loss1': tensor(1.0894e+15), 'loss2': tensor(11.4538, grad_fn=<DivBackward0>), 'loss3': tensor(2.0907, grad_fn=<LogBackward0>)}
tensor(1.0894e+15, grad_fn=<AddBackward0>)

Traceback (most recent call last):
....

Below is the training step:

    def training_step(self, train_batch, batch_idx):
        
        # x = (2,8,img_yaxis,img_xaxis) images, y = groundtruth
        x, y = train_batch

        # Set requires_grad = True
        x = torch.tensor(x, requires_grad=True)
        
        out = self.network(x)

        loss_func = ls.Loss(y[0])
    
        # Out = Tensor([probability_map, gene_map])
        # probability map: 3D (3 batches, 572, 572)
        # barcode gene map: 4D (3 batches, 16, 572, 572)
        probability_map = out[:, 0, :,:]
        gene_map = out[:, 1:17, :,:]
        
        # Now probability_map is 3D (3 batches, 572, 572).
        # So we cast it back to 4D (3 batches, 1, 572, 572).
        probability_map = probability_map[:, None, :, :]
        
        num_batches = probability_map.shape[0]
        
        # FAST COMPUTATION OF LOSS WITHOUT FOR-LOOP using MAP()
        loss = list(map(loss_func, probability_map, gene_map))

        for idx in range(num_batches):
            
            self.log('train_loss', loss[idx])
        
        return torch.mean(torch.FloatTensor(loss))

ptrblck · December 22, 2021, 5:59am

No, this would only be needed, if you really want to calculate gradients in the input (e.g. for adversarial attacks).

This is also not needed and I don’t know right now a valid use case where the targets need gradients.

Could you explain how this return value is treated?

return torch.mean(torch.FloatTensor(loss))

Are you calling backward() on this averaged loss? If so, this won’t work since you are recreating a new tensor and will thus break the computation graph. Try to use torch.stack or torch.cat instead.

tsuijenk · December 22, 2021, 7:37am

Hi Piotr,

Thank you very much for your quick reply! I had been stuck on this runtime error for two days! Glad to have some pointers.

No, this would only be needed, if you really want to calculate gradients in the input (e.g. for adversarial attacks).

But in my custom dataset class, I specifically set inputs’s require_grad=False. If I don’t reset it to True before passing into the network, doesn’t it mean that there will be no grad_fn being tracked?

Are you calling backward() on this averaged loss? If so, this won’t work since you are recreating a new tensor and will thus break the computation graph. Try to use torch.stack or torch.cat instead.

Yes, I believe the backward() is called. The idea of using an average loss is because in the case where batch_size >= 2, the network architecture will be analyzing multiple batches at once, hence multiple losses.

I assumed that there is no way that we can call backward() on these losses one by one, hence I took an average loss. What’s your suggestions? If we “return” a vector of three losses (say we have batch_size = 3), how is backward() being called on this vector of losses?

return torch.mean(torch.FloatTensor(loss))

To be more specific about this line, “loss” is a list of Tensor floats resulting from the map() function. Then we convert it to a Torch Float Tensor vector and then we take average.

Looking forward to your reply! Many thanks!

ptrblck · December 22, 2021, 7:44am

I assume you are concerned about the grad_fn of the intermediate activations and thus the computation graph: no, this won’t be a problem since you are only changing if the input should get gradients and are not changing anything else in the model. Here is a small example:

# standard use case
x = torch.randn(1, 1)
print(x.requires_grad)
# > False

lin = nn.Linear(1, 1)
out = lin(x)
print(out.grad_fn)
# > <AddmmBackward0 object at 0x7fcea08c5610>
out.backward()
print(lin.weight.grad)
# > tensor([[-0.9785]])
print(x.grad)
# > None

# input requires grad
x = torch.randn(1, 1, requires_grad=True)
print(x.requires_grad)
# > True

lin = nn.Linear(1, 1)
out = lin(x)
print(out.grad_fn)
# > <AddmmBackward0 object at 0x7fcea08d4640>
out.backward()
print(lin.weight.grad)
# > tensor([[1.6739]])
print(x.grad)
# >tensor([[0.0300]])

Both approaches work with the difference that the second one will calculate gradients for the input, which is usually not needed.

Use torch.stack.

tsuijenk · December 22, 2021, 8:04am

Thank you so much, Piotr! My code is finally running like a blast! Thank you so much!