Hi,
I was following finetuning torchvision models from pytorch’s tutorials (Finetuning Torchvision Models — PyTorch Tutorials 1.2.0 documentation) on a custom dataset.
It was working fine until today. Below line in train method throws the following error:
if phase == 'train':
loss.backward()
optimizer.step()
loss.backward throws the following error:
RuntimeError Traceback (most recent call last)
<ipython-input-30-78fbf03a599a> in <module>()
1 # Build, train and analyze the model with the pipeline
----> 2 model = model_pipeline(config)
4 frames
<ipython-input-22-fed3f87e5556> in model_pipeline(hyperparameters)
14
15 # and use them to train model
---> 16 train(model, dataloaders, criterion, optimizer, config)
17
18 # and test it's final performance
<ipython-input-28-8159f04f5f98> in train(model, dataloaders, criterion, optimizer, config)
44 # backward and optimize only for training phase
45 if phase == "train":
---> 46 loss.backward()
47 optimizer.step()
48
/usr/local/lib/python3.7/dist-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
243 create_graph=create_graph,
244 inputs=inputs)
--> 245 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
246
247 def register_hook(self, hook):
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
145 Variable._execution_engine.run_backward(
146 tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 147 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
148
149
/usr/local/lib/python3.7/dist-packages/torch/utils/hooks.py in hook(grad_input, _)
101 def hook(grad_input, _):
102 if self.grad_outputs is None:
--> 103 raise RuntimeError("Module backward hook for grad_input is called before "
104 "the grad_output one. This happens because the gradient "
105 "in your nn.Module flows to the Module's input without "
RuntimeError: Module backward hook for grad_input is called before the grad_output one. This happens because the gradient in your nn.Module flows to the Module's input without passing through the Module's output. Make sure that the output depends on the input and that the loss is computed based on the output.
Below are the versions used:
PyTorch version: 1.8.1+cu101
Torchvision version: 0.9.1+cu101
The code is run on GPU from a Google Colab notebook.
Any pointers would be of great help.