Trouble with passing manually calculated gradient as argument of backward()

JakeNocentino · July 29, 2019, 10:29pm

Hi all,

I am currently having trouble with passing in a gradient manually as the argument to the backward() function. I am doing this because the loss I compute is done with a graphical model in C++. I then manually calculate the gradient of the C++ loss w.r.t. h.

h is a N x G matrix that is multiplied by a G x 2 (I have 2 classes) matrix to calculate my logits. Anyways, I manually calculate this gradient and then pass it to backward() which I call on the original h in my PyTorch code. However, this results in my performance going down drastically (.5 AUC) and I have noticed that, after calling h.backward(h_grad), all gradients in my parameters are remarkably larger in magnitude than before (e.g., most of them being in the ±80s, ±90s, ±100s, etc., when originally they were all between -1 and 1).

Am I doing something wrong here? In theory, I believe that if I pass the gradient of the loss w.r.t. h into h, then it should be basically the same thing as calling loss.backward(). If this is wrong, or if I am wrong, someone please let me know!

Thank you!

ptrblck · July 30, 2019, 9:36pm

Are you setting the gradient directly as the attribute of the parameter or are you passing it to the loss function?
Do you have a small code snippet to show your use case?

JakeNocentino · July 31, 2019, 3:50am

My code looks something like this:

self.train_one_step(hgrad)
# hgrad is the gradient manually calculated by the graphical model C++ code

And then, in the train_one_step() method, I call

h.backward(hgrad)

I guess my main question is, this is the correct way to handle a scenario like this (where we do not backward() directly from loss), right? If it is, then I guess it narrows my problem down to a data error or a gradient calculation error.