AUTOGRAD.GRAD Related Error: The Occurence of Error Seemed To Be Affected By GRAD_OUTPUT Values

Hello everyone,

This post is an extention to

Second Order Derivative with Nan Value - RuntimeError: Function 'SigmoidBackwardBackward0' returned nan values in its 0th output.

Suppose: g1 = torch.autograd.grad(loss, params), g2 = torch.autograd.grad(g1, params, grad_outputs), where params denotes a list of parameters of a model, and hence g1 would also be a list.

My new finding is that:

  1. With torch.autograd.set_detect_anomaly(True), only when setting grad_outputs = params, the calculation of g2 will be error free, other value of grad_outputs will lead to an error as shown below, even if grad_outputs is set to a value that is equal to a slight perturbation of params. Where * denotes Mul, Sigmoid, Mm, meaning that the error still persist when I tried to modify the model a little bit.
Function '*Backward0' returned nan values in its 1th output.
  1. Situation in 1 seedmed to only bother the model I am currently dealing with, a relatively large contrastive Text-Image model.