Hello everyone,
This post is an extention to
Suppose: g1 = torch.autograd.grad(loss, params), g2 = torch.autograd.grad(g1, params, grad_outputs), where params denotes a list of parameters of a model, and hence g1 would also be a list.
My new finding is that:
- With
torch.autograd.set_detect_anomaly(True), only when settinggrad_outputs = params, the calculation ofg2will be error free, other value ofgrad_outputswill lead to an error as shown below, even ifgrad_outputsis set to a value that is equal to a slight perturbation ofparams. Where*denotesMul,Sigmoid,Mm, meaning that the error still persist when I tried to modify the model a little bit.
Function '*Backward0' returned nan values in its 1th output.
- Situation in 1 seedmed to only bother the model I am currently dealing with, a relatively large contrastive Text-Image model.