Trouble with passing manually calculated gradient as argument of backward()

Hi all,

I am currently having trouble with passing in a gradient manually as the argument to the backward() function. I am doing this because the loss I compute is done with a graphical model in C++. I then manually calculate the gradient of the C++ loss w.r.t. h.

h is a N x G matrix that is multiplied by a G x 2 (I have 2 classes) matrix to calculate my logits. Anyways, I manually calculate this gradient and then pass it to backward() which I call on the original h in my PyTorch code. However, this results in my performance going down drastically (.5 AUC) and I have noticed that, after calling h.backward(h_grad), all gradients in my parameters are remarkably larger in magnitude than before (e.g., most of them being in the ±80s, ±90s, ±100s, etc., when originally they were all between -1 and 1).

Am I doing something wrong here? In theory, I believe that if I pass the gradient of the loss w.r.t. h into h, then it should be basically the same thing as calling loss.backward(). If this is wrong, or if I am wrong, someone please let me know!

Thank you!

Are you setting the gradient directly as the attribute of the parameter or are you passing it to the loss function?
Do you have a small code snippet to show your use case?

My code looks something like this:

# hgrad is the gradient manually calculated by the graphical model C++ code

And then, in the train_one_step() method, I call


I guess my main question is, this is the correct way to handle a scenario like this (where we do not backward() directly from loss), right? If it is, then I guess it narrows my problem down to a data error or a gradient calculation error.