Hello,
while trying to train my network I am getting this annoying error. Looks like the gradients are not working as expected or something.
Anyway I’ll be happy to provide you with more code and instructions if needed.
THANKS !
Warning: Error detected in MulBackward0. Traceback of forward call that caused the error:
File "/home/kd-6d-pose-adlp/train_kd.py", line 135, in <module>
_, loss_dict = model(images, targets=targets, pred_t=pred_t, cfg_kd=cfg_kd)
File "/opt/conda/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kd-6d-pose-adlp/models/model_kd.py", line 84, in forward
pred_cls, pred_reg, pred_evi = self.head(features)
File "/opt/conda/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kd-6d-pose-adlp/models/model.py", line 465, in forward
evidential_pred = self.scales[l](self.evidential_pred(pose_tower))
File "/opt/conda/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kd-6d-pose-adlp/models/model.py", line 24, in forward
return input * self.scale
(function _print_stack)
steps: 1/20000, lr:0.000040, cls:13.9574, reg:41.6412, kd:0.0000, evi:41.6412: 8%|▊ | 1/12 [00:31<05:50, 31.82s/it]
Traceback (most recent call last):
File "/home/kd-6d-pose-adlp/train_kd.py", line 166, in <module>
loss.backward(retain_graph=True)
File "/opt/conda/envs/myenv/lib/python3.9/site-packages/comet_ml/monkey_patching.py", line 317, in wrapper
raise exception_raised
File "/opt/conda/envs/myenv/lib/python3.9/site-packages/comet_ml/monkey_patching.py", line 288, in wrapper
return_value = original(*args, **kwargs)
File "/opt/conda/envs/myenv/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/opt/conda/envs/myenv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1]] is at version 2; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!