Is there a loss that is too small?

I have a loss function, that has values around 5.0 at the beginning of the training and goes down to 0.0003 during training. Then I get the error

RuntimeError                              Traceback (most recent call last)
\lib\site-packages\torch\autograd\, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    170 # The reason we repeat same the comment below is that
    171 # some Python versions print out the first line of a multi-line function
    172 # calls in the traceback and some print out the last line
--> 173 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    174     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    175     allow_unreachable=True, accumulate_grad=True)

RuntimeError: Function 'MseLossBackward0' returned nan values in its 0th output.
    170 if isinstance(error, str):
    171     error = AssertionError(error)
--> 172 raise error


Neither the loss function nor the prediction or the label have nan values. That’s why I assume, the loss function is too small and creates nan values in the backward.

Is that true
and when is the loss function to small?
Should I multiply the loss by some factor if I want to train further?

There shouldn’t be a limit on the loss value for nn.MSELoss as it can be negative as well (it might not make sense, but it won’t error out or should not create NaN values for zero or negative values).
Could you check the model output as well as the target in the iteration creating this error?