Hello,we have a framework which will be responsible for doing configuration,start/stop training,etc,and it can be used to run another model training without any problem,so this framework seems work fine,but when it used to train my current model,the first epoch runs fine,so this model seems work fine also,but when the second epoch just begin,an exception got thrown out:
File "/usr/local/lib/python3.8/dist-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py", line 125, in backward
Variable._execution_engine.run_backward(
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
by debugging,I found that the output tenor of network has grad_fn = None,and this is reproduciable: always comes in FIRST backwarding of SECOND epoch.
Since this framework can be used to train other modes without any problem,and for current model,the first epoch runs fine also,so I want to dig deep to find out why grad_fn doesn’t got set,and after searching the source code of pytorch,I found it seems like being set in nn.modules.module.py line 736
grad_fn = var.grad_fn
if grad_fn is not None:
but debugging doesn’t confirm my searching,so would you please tell where is grad_fn got set,thanks?