Hello everyone,
This post is an extention to
Suppose: g1 = torch.autograd.grad(loss, params)
, g2 = torch.autograd.grad(g1, params, grad_outputs)
, where params
denotes a list of parameters of a model, and hence g1
would also be a list.
My new finding is that:
- With
torch.autograd.set_detect_anomaly(True)
, only when settinggrad_outputs = params
, the calculation ofg2
will be error free, other value ofgrad_outputs
will lead to an error as shown below, even ifgrad_outputs
is set to a value that is equal to a slight perturbation ofparams
. Where*
denotesMul
,Sigmoid
,Mm
, meaning that the error still persist when I tried to modify the model a little bit.
Function '*Backward0' returned nan values in its 1th output.
- Situation in 1 seedmed to only bother the model I am currently dealing with, a relatively large contrastive Text-Image model.