The loss function is L1 loss.

Now I will make some assumptions to make my question better understood.

My model is f(x) like and the x is the input of my model, there will be nothing wrong when I training f(x) individually, but when I try to make multiplication ( x * f(x) ) in my model, the NaN will occur after one iteration.

And I also tried to use torch.autograd.set_detect_anomaly(True), I found all my data become NaN in my model.

Thanks in advance.