I am new to pytorch and i am having a hard time to get my head around this runtime error.
My model’s forward function is as follow:
def forward(self, input, hidden=None):
if hidden is None :
hidden = self.init_hidden(input.size(0))
out, hidden = self.lstm(input, hidden)
out = self.linear(out)
return out, hidden
and my training loop
def training(dataloader, iterations, device):
torch.autograd.set_detect_anomaly(True)
model = NModel(662, 322, 2, 1)
hidden = None
model.train()
loss_fn = F.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
running_loss = []
last_loss = 0
for i, (feature, label) in tqdm(enumerate(dataloader)):
optimizer.zero_grad()
outputs, hidden = model(feature, hidden)
loss = loss_fn(outputs, label)
print("loss item" , loss.item())
running_loss.append(loss.item())
loss.backward(retain_graph=True)
optimizer.step()
if i%1000 == 0:
last_loss = len(running_loss) /1000
return last_loss
and error’s stack trace is as follow:
/home/elie/miniconda3/envs/pytorch-openpose/lib/python3.7/site-packages/torch/autograd/__init__.py:156: UserWarning: Error detected in AddmmBackward0. Traceback of forward call that caused the error:
File "main.py", line 18, in <module>
main()
File "main.py", line 14, in main
training(dataloader=training_loader, iterations=3, device=0)
File "/home/elie/gitclones/neuro-demo1-feature-extraction/training.py", line 27, in training
outputs, hidden = model(feature, hidden)
File "/home/elie/miniconda3/envs/pytorch-openpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/elie/gitclones/neuro-demo1-feature-extraction/models.py", line 46, in forward
out, hidden = self.lstm(input, hidden)
File "/home/elie/miniconda3/envs/pytorch-openpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/elie/miniconda3/envs/pytorch-openpose/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 692, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
(Triggered internally at /opt/conda/conda-bld/pytorch_1634272092750/work/torch/csrc/autograd/python_anomaly_mode.cpp:104.)
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
1it [00:00, 10.28it/s]
Traceback (most recent call last):
File "main.py", line 18, in <module>
main()
File "main.py", line 14, in main
training(dataloader=training_loader, iterations=3, device=0)
File "/home/elie/gitclones/neuro-demo1-feature-extraction/training.py", line 31, in training
loss.backward(retain_graph=True)
File "/home/elie/miniconda3/envs/pytorch-openpose/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/elie/miniconda3/envs/pytorch-openpose/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [322, 1288]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
The part that confused me the most is when i change the input of the LSTM model in the forward function to
def forward(self, input, hidden=None)
out, hidden = self.lstm(input)
out = self.linear(out)
return out, hidden
and remove retain_graph=True
flag my training loop, everything works fine.
I am not sure what is causing this but i suspect it has to do with the way i am passing hidden
to my lstm model.
Any help is appreciated .