Backpropagation failed because of 'UnsqueezeBackward0'

Dear all,
currently I try to train a model in which I need to write a modified loss function. After some running time I the training is interrupted by an error in the backward propagation, because the Function ‘UnsqueezeBackward0’ returns nan values in its 0th output.

I tried to figure out what could generate this kind of error, as so far as I know, the function unsqueeze is just an formating function, thus it won’t affect the backward propagation. I am stuck in this debugging, as I cannot figure out what can caused this kind of errors.
Any hint would be appreciated :slight_smile:

Below I attached the errors that I got:


  File ".\testnet_gpu_withsupervised_restructured.py", line 363, in <module>
    check_path = checkpoint_path

  File ".\testnet_gpu_withsupervised_restructured.py", line 131, in training_loop
    loss_all.backward()

  File ".\site-packages\torch\tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)

  File ".\site-packages\torch\autograd\__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag

RuntimeError: Function 'UnsqueezeBackward0' returned nan values in its 0th output.

Best Regard,