Update overlapping parameters using different losses

Hi,

I have a question on how to update overlapping parameters using different losses. For example,

hidden = encoder(imgs)
reconstructed = decoder(hidden)
prediction = classifier(hidden)

optimizer1 = Adam(encoder.parameters())
optimizer2 = Adam(decoder.parameters())
optimizer3 = Adam(classifer.parameters())

loss1 = Loss1(imgs, reconstructed)
loss2 = Loss2(prediction, labels)

In this case, I’d like to

  1. minimise loss1 and only update the parameters of encoder and decoder.
  2. minimise loss2 and only update the parameters of encoder and classifer.
  3. maximise loss2 and only update the parameters of encoder.

I tried to call backward() for each loss separately, but I’ve got the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 32, 7, 7]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I’m not sure if it’s the correct way. Or do I need to sum the loss and call backward() only once?

Thanks in advance for any help!

I think, the shouldn’t share parameters (atleast in my experience we separate the parameters heads accordingly to our loss functions)

And usually we just do (loss_1 + loss_2 + .... + loss_n).backward() (or mean/weighted approach etc)

Thanks! If doing backward() once, is there a way to update parameters according to each sub loss?

Ahh, i am not sure but i think my initial comment might be a bit misleading. Let’s see via an example.

Let’ss ay we are working on Image Classification / Bounding Box (locating where the object of interest lies within an image) + Generating the caption as well at the same time. Now these tow things will share a common backbone pretty much and then they will diverge into two separate “heads”. And naturally we will have two separate losses as well for the respective heads.

So, we have loss_from_classification_head and loss_from_captioning_head computed with us and then we can compute the total loss directly by loss_from_classification_head + loss_from_captioning_head and only calling backward once. PyTorch will take care of stuffs for you pretty much.

In your initial example, you are creating multiple optimizers, so do you plan to sue different learning_rate for them? if not, you can pretty much just do main_model.parameters()` and pass it to an optimizer and have that model output data as per the heads etc that you want to compute the loss for.

loss1 = Loss1(imgs, reconstructed)
loss2 = Loss2(prediction, labels)

total_loss = loss1 + loss2
total_loss.backward() # that's it!

Thanks! If doing backward() once, is there a way to update parameters according to each sub loss?

If by this, Do you mean that you want to use separate learning rate for each separate sub-model components that you have defined above? (e.g. encoder’s LR is 1e-3, decoder’s LR is 1e-4 and classifier’s LR is 1e-2?)

If that’s what you want to do, then just pass the LR in a single optimizer by separating the model parameters.

e.g.


# lrs is a list of 2 elements basically.
opt = optim.Adam([{'params': model.body.parameters(), 'lr':lrs[0]},
                {'params': model.head.parameters(), 'lr': lrs[1]}])

Please let me know if this helps!

1 Like

Thanks! It helps a lot!