The loss came to NaN, and I debug the autograd by using
torch.autograd.set_detect_anomaly(True) and got the error below.
.... File "main.py", line 374, in compute_plane ca_cb, LU = torch.solve(B, A + eps) Traceback (most recent call last): File "main.py", line 879, in <module> main() File "main.py", line 420, in main train(fk, train_loader, val_loader, model, optimizer, lr_scheduler, last_iter+1, tb_logger, criterion=loss_fun) File "main.py", line 544, in train loss.backward() File "/lib/python3.6/site-packages/torch/tensor.py", line 107, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Function 'SolveBackward' returned nan values in its 0th output.
Does anyone have a solution for this problem? pls help, Thanks.