Hi all,

I am currently having trouble with passing in a gradient manually as the argument to the `backward()`

function. I am doing this because the loss I compute is done with a graphical model in C++. I then manually calculate the gradient of the C++ loss w.r.t. h.

h is a N x G matrix that is multiplied by a G x 2 (I have 2 classes) matrix to calculate my logits. Anyways, I manually calculate this gradient and then pass it to `backward()`

which I call on the original h in my PyTorch code. However, this results in my performance going down drastically (.5 AUC) and I have noticed that, after calling `h.backward(h_grad)`

, all gradients in my parameters are remarkably larger in magnitude than before (e.g., most of them being in the ±80s, ±90s, ±100s, etc., when originally they were all between -1 and 1).

Am I doing something wrong here? In theory, I believe that if I pass the gradient of the loss w.r.t. h into h, then it should be basically the same thing as calling `loss.backward()`

. If this is wrong, or if I am wrong, someone please let me know!

Thank you!