RuntimeError: Module backward hook for grad_input is called before the grad_output one

Hi,

I was following finetuning torchvision models from pytorch’s tutorials (Finetuning Torchvision Models — PyTorch Tutorials 1.2.0 documentation) on a custom dataset.

It was working fine until today. Below line in train method throws the following error:

if phase == 'train':
  loss.backward()
  optimizer.step()

loss.backward throws the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-30-78fbf03a599a> in <module>()
      1 # Build, train and analyze the model with the pipeline
----> 2 model = model_pipeline(config)

4 frames
<ipython-input-22-fed3f87e5556> in model_pipeline(hyperparameters)
     14 
     15     # and use them to train model
---> 16     train(model, dataloaders, criterion, optimizer, config)
     17 
     18     # and test it's final performance

<ipython-input-28-8159f04f5f98> in train(model, dataloaders, criterion, optimizer, config)
     44           # backward and optimize only for training phase
     45           if phase == "train":
---> 46             loss.backward()
     47             optimizer.step()
     48 

/usr/local/lib/python3.7/dist-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    243                 create_graph=create_graph,
    244                 inputs=inputs)
--> 245         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    246 
    247     def register_hook(self, hook):

/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    145     Variable._execution_engine.run_backward(
    146         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 147         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    148 
    149 

/usr/local/lib/python3.7/dist-packages/torch/utils/hooks.py in hook(grad_input, _)
    101         def hook(grad_input, _):
    102             if self.grad_outputs is None:
--> 103                 raise RuntimeError("Module backward hook for grad_input is called before "
    104                                    "the grad_output one. This happens because the gradient "
    105                                    "in your nn.Module flows to the Module's input without "

RuntimeError: Module backward hook for grad_input is called before the grad_output one. This happens because the gradient in your nn.Module flows to the Module's input without passing through the Module's output. Make sure that the output depends on the input and that the loss is computed based on the output.

Below are the versions used:
PyTorch version: 1.8.1+cu101
Torchvision version: 0.9.1+cu101

The code is run on GPU from a Google Colab notebook.

Any pointers would be of great help.

Are you getting this error in the original notebook or which changes have you applied?

Hi @ptrblck ,

Not in the original notebook. I’d incorporated wandb for experiment tracking in mine.

A similar question was raised here in the forum (RuntimeError: Module backward hook for grad_input is called before the grad_output one. This happens because the gradient in your nn.M odule flows to the Module's input without passing through the Module's output - #3 by morg) a few days back.

I had replied to that post to check whether the issue was resolved. Looks like wandb.watch was the reason and on removing that, it went through is what @GeoffNN had replied. I tried the same in mine and it went through. Though I’m not sure what’s the connection between those.

Would be curious to understand that. Thanks for checking.