AUTOGRAD.GRAD Related Error: The Occurence of Error Seemed To Be Affected By GRAD_OUTPUT Values

untitled47 · February 24, 2023, 4:45pm

Hello everyone,

This post is an extention to

Second Order Derivative with Nan Value - RuntimeError: Function 'SigmoidBackwardBackward0' returned nan values in its 0th output.

Suppose: g1 = torch.autograd.grad(loss, params), g2 = torch.autograd.grad(g1, params, grad_outputs), where params denotes a list of parameters of a model, and hence g1 would also be a list.

My new finding is that:

With torch.autograd.set_detect_anomaly(True), only when setting grad_outputs = params, the calculation of g2 will be error free, other value of grad_outputs will lead to an error as shown below, even if grad_outputs is set to a value that is equal to a slight perturbation of params. Where * denotes Mul, Sigmoid, Mm, meaning that the error still persist when I tried to modify the model a little bit.

Function '*Backward0' returned nan values in its 1th output.

Situation in 1 seedmed to only bother the model I am currently dealing with, a relatively large contrastive Text-Image model.