The loss came to NaN, and I debug the autograd by using torch.autograd.set_detect_anomaly(True)
and got the error below.
....
File "main.py", line 374, in compute_plane
ca_cb, LU = torch.solve(B, A + eps)
Traceback (most recent call last):
File "main.py", line 879, in <module>
main()
File "main.py", line 420, in main
train(fk, train_loader, val_loader, model, optimizer, lr_scheduler, last_iter+1, tb_logger, criterion=loss_fun)
File "main.py", line 544, in train
loss.backward()
File "/lib/python3.6/site-packages/torch/tensor.py", line 107, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Function 'SolveBackward' returned nan values in its 0th output.
Does anyone have a solution for this problem? pls help, Thanks.