Manually Modifying Gradients Calculated from loss.backward()

Hi all,

I am trying to calculate and manually modify gradients for a resnet50 model that outputs predictions for two classification tasks (a primary task that I care about and an auxiliary task). The model is modified such that the output on forward pass is the predictions for each task using cross-entropy loss.

When I’m using pytorch’s default implementation of backprop, my current implementation of the training loop in pytorch is working and is as follows:

outputs = net(inputs) #will have 2 outputs

loss_primary = train_gt_criterion(outputs[0], gt) 
loss_aux = train_bin_criterion(outputs[1], bin) 


However, I would like to change the gradients of the auxiliary task based on a function (i.e. weighted cosine) and was wondering how I could do this in Pytorch.

I know in Tensorflow you can do something like this to pass a modified gradient:

primary_loss = primary_function(x)
auxiliary_loss = auxiliary_function(x)
primary_grad = tape.gradient(primary_loss, x)
auxiliary_grad = tape.gradient(auxiliary_loss, x)
new_grad = modify_gradient(auxiliary_grad, primary_grad) #Dummy function that incorporates gradient from my primary task and auxilliary task
optimizer.apply_gradients([(primary_grad + lam*new_grad, x)])

I was wondering if it was possible do something along these lines in pytorch, specifically take the gradients calculated from both the primary and auxiliary losses and use them to modify an auxiliary gradient. Right now, I’m not sure how this is possible with the way I’ve written the code using the torch function backward.

Thank you for your time and help!

You can iterate through the model parameters and modify them as u want.

for n,p in model.named_parameters():
    if n == desired_layer_name:

if you want to modify all the gradients based in the loss’ gradient I think (cannot promise) you can do it with backward hooks. It basically runs some code once backward is called.

Thanks that’s helpful. Do you know how I could extract gradients calculated from another loss (i.e. my primary task)?

Well given the fact the gradients of the sum are additive I’d say if you keep track of the 1st loss, then compute the second and substract you should get that.

To give more context, gradients in pytorch are accumulative so each time you backward gradients are retained and sumed.

Hmmm… I’m not sure I understand especially if I’m trying to use gradients calculated from the loss of two different tasks. Another way to ask what I’m trying to do is: what is the pytorch equivalent of this code?

#Where x represents an input to a model

 with tf.GradientTape(persistent=True) as tape:
      primary_loss = compute_primary_task_loss(x)
      auxiliary_loss = compute_auxiliary_task_loss(x)
    primary_grad = tape.gradient(primary_loss, x)
    auxiliary_grad = tape.gradient(auxiliary_loss, x)
    new_grad = censored_vector(auxiliary_grad,
    optimizer.apply_gradients([(primary_grad + lam*new_grad, x)])