I am currently having trouble with passing in a gradient manually as the argument to the
backward() function. I am doing this because the loss I compute is done with a graphical model in C++. I then manually calculate the gradient of the C++ loss w.r.t. h.
h is a N x G matrix that is multiplied by a G x 2 (I have 2 classes) matrix to calculate my logits. Anyways, I manually calculate this gradient and then pass it to
backward() which I call on the original h in my PyTorch code. However, this results in my performance going down drastically (.5 AUC) and I have noticed that, after calling
h.backward(h_grad), all gradients in my parameters are remarkably larger in magnitude than before (e.g., most of them being in the ±80s, ±90s, ±100s, etc., when originally they were all between -1 and 1).
Am I doing something wrong here? In theory, I believe that if I pass the gradient of the loss w.r.t. h into h, then it should be basically the same thing as calling
loss.backward(). If this is wrong, or if I am wrong, someone please let me know!