Multiple gradient updates with two separate losses and two classifiers sharing the same encoder

Hi,
All of the parameters (of the encoder, and any other layers) are ideally leaf tensors in the computation graphs of the losses.

Backpropagation: You simply need to call loss.backward() to calculate the gradient of the loss wrt the model parameters (specifically, wrt any leaf tensors in the graph of loss).

Parameter updates: After the loss is backpropagated, use optimizer.step to update the model parameters.

For your use case, the following pseudo code should work.

import itertools
import torch

params = [encoder.parameters(), fc1.parameters(), fc2.parameters()]
optimizer = torch.optim.Adam(itertools.chain(*params), lr=0.01)


for batch_idx, batch in dataloader_instance:
     # calculate lcce and lwd
     lcce.backward()
     optimizer.step()
     optimizer.zero_grad()

     lwd = -1 * lwd
     lwd.backward()
     
     for param in encoder.parameters():
          param.grad = -1*beta*param.grad
     optimizer.step()
     optimizer.zero_grad()
1 Like