I’m using CrossEntropyLoss
with a batch size of 4. These are the predicted/actual labels I’m feeding to it along with the value of the loss:
preds:
tensor([[-0.0052, 0.2059, -0.1473],
[-0.0250, 0.0953, 0.0047],
[ 0.0684, 0.1638, -0.0705],
[-0.0195, 0.0100, -0.0874]], device='cuda:0', grad_fn=<AddmmBackward>)
target:
tensor([2, 2, 2, 2], device='cuda:0')
loss: tensor(1.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Here is the error message I’m getting after setting autograd.set_detect_anomaly(True)
:
Traceback (most recent call last):
File "run_liunet.py", line 247, in <module>
main()
File "run_liunet.py", line 200, in main
train(model, train_loader, optimizer, device, epoch, 'train', debug_mode)
File "run_liunet.py", line 80, in train
loss.backward()
File "/home/jlko/miniconda3/envs/liuNetEnv/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/jlko/miniconda3/envs/liuNetEnv/lib/python3.6/site-packages/torch/autograd/__init__.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Function 'NativeBatchNormBackward' returned nan values in its 0th output.
Here is the architecture of my neural network: https://raw.githubusercontent.com/Information-Fusion-Lab-Umass/mri-features/9ccf23cd0ecb4f163c76483fed52e4709b02ea4f/liunet.py. I am only using the self.conv
and self.fc
modules, so you can ignore all of the stuff related to self.age_encoder
.