RuntimeError
Sorry to bore, but may I know some hints to solve this problem?
-
I use the mnist example provided by pytorch by running main.py with autograd anomaly detection set to True, and get this error.
-
I actually observed this error in another codes of mine, and thought it was triggered by taking sqrt of a very small small number. I tested my codes (and this example) with cpu and on other machine and they run just fine.
-
I think I might have something wrong with my cuda installation but I cannot find a solution. So, I am here to look for any hints that might be helpful. Do I need to install cuda and cudnn that matches the version of cudatoolkit shipped with pytorch conda install?
Pytorch version
pytorch 1.5.0 py3.8_cuda10.1.243_cudnn7.6.3_0 pytorch
here are the error report
Warning: Error detected in TBackward. Traceback of forward call that caused the error:
File âmain.pyâ, line 139, in
main()
File âmain.pyâ, line 130, in main
train(args, model, device, train_loader, optimizer, epoch)
File âmain.pyâ, line 42, in train
output = model(data)
File â/home/yanglei/anaconda3/envs/pt150cu101/lib/python3.8/site-packages/torch/nn/modules/module.pyâ, line 550, in call
result = self.forward(*input, **kwargs)
File âmain.pyâ, line 29, in forward
x = self.fc1(x)
File â/home/yanglei/anaconda3/envs/pt150cu101/lib/python3.8/site-packages/torch/nn/modules/module.pyâ, line 550, in call
result = self.forward(*input, **kwargs)
File â/home/yanglei/anaconda3/envs/pt150cu101/lib/python3.8/site-packages/torch/nn/modules/linear.pyâ, line 87, in forward
return F.linear(input, self.weight, self.bias)
File â/home/yanglei/anaconda3/envs/pt150cu101/lib/python3.8/site-packages/torch/nn/functional.pyâ, line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
(print_stack at /opt/conda/conda-bld/pytorch_1587428094786/work/torch/csrc/autograd/python_anomaly_mode.cpp:60)
Traceback (most recent call last):
File âmain.pyâ, line 139, in
main()
File âmain.pyâ, line 130, in main
train(args, model, device, train_loader, optimizer, epoch)
File âmain.pyâ, line 44, in train
loss.backward()
File â/home/yanglei/anaconda3/envs/pt150cu101/lib/python3.8/site-packages/torch/tensor.pyâ, line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File â/home/yanglei/anaconda3/envs/pt150cu101/lib/python3.8/site-packages/torch/autograd/init.pyâ, line 98, in backward
Variable._execution_engine.run_backward(
RuntimeError: Function âTBackwardâ returned nan values in its 0th output.