Multiple gradient updates with two separate losses and two classifiers sharing the same encoder

srishti-git1110 · January 13, 2023, 6:25pm

Hi,
All of the parameters (of the encoder, and any other layers) are ideally leaf tensors in the computation graphs of the losses.

Backpropagation: You simply need to call loss.backward() to calculate the gradient of the loss wrt the model parameters (specifically, wrt any leaf tensors in the graph of loss).

Parameter updates: After the loss is backpropagated, use optimizer.step to update the model parameters.

For your use case, the following pseudo code should work.

import itertools
import torch

params = [encoder.parameters(), fc1.parameters(), fc2.parameters()]
optimizer = torch.optim.Adam(itertools.chain(*params), lr=0.01)


for batch_idx, batch in dataloader_instance:
     # calculate lcce and lwd
     lcce.backward()
     optimizer.step()
     optimizer.zero_grad()

     lwd = -1 * lwd
     lwd.backward()
     
     for param in encoder.parameters():
          param.grad = -1*beta*param.grad
     optimizer.step()
     optimizer.zero_grad()