Modify adding gradients in backward()

amirhf · December 2, 2019, 10:38pm

my model has two sets of parameters a and b which require gradients. I have two different loss functions as:

loss1 = function(a, b)
loss2 = function(b)

and the total loss is

total_loss = loss1 + loss2

Assuming:

loss1 calculates b1.grad
loss2 calculates b2.grad,
then the total b.grad calculated by total_loss.backward() is b.grad = b1.grad + b2.grad.

My goal is to modify b1.grad coming from loss1 and b2.grad comig from loss2 and then add them together as backward gradients for b. Currently when I do total_loss.bakward(), it already gives me the accumulated gradients for b.grad=b1.grad + b2.grad. How can I access and modify each individual b1.grad and b2.grad so the total_loss.backward() returns the modified and added b1.grad and b2.grad?

albanD · December 2, 2019, 10:49pm

Hi,

You will have to create new intermediary Tensors like the following.
Also hooks are a good way to modify these gradients.

b = # some tensor that requires_grad
b1 = b.view_as(b) # returns a new Tensor that is the same as b
def modif_for_grad1(grad):
  return 3 * grad + 42
b1.register_hook(modif_for_grad1)
b2 = b.view_as(b) # returns a new Tensor that is the same as b
b2.register_hook(modif_for_grad2)

loss1 = function(a, b1)
loss2 = function(b2)

amirhf · December 3, 2019, 12:38am

Thank you for this nice workaround. However to get b.grad = b1.grad + b2.grad I managed to do a hacky way as below:

b_grad = 0
for param in total_loss.parameters():
    if param.grad is not None:
      b_grad += param.grad
  
  for param in total_loss.parameters():
    if param.grad is not None:
      param.grad = b_grad

Is there a better way to do this? Because, when you make copies of the original parameter (intermediate ones) I should their gradients together and replace them.