Multiple optimizer issues

Hello, I’m trying to figure this problem of optimizing different part of the model with different data.

The setting I’m using is composed of data1 and data2 with model partA, partB, and partC.

The original code was like this:

optimizer = optim.Adam([partA, partB, partC], lr=0.03)

for input, target in data1:
     optimizer.zero_grad()
     # loss1 calculation with partA,B,C
     loss1.backward()
     
     input, target in data2:
     partA.requires_grad = False
     partC.requires_grad = False
     # loss2 calculation with only partB
     loss2.backward()
     partA.requires_grad = True
     partC.requires_grad = True
     
     optimizer.step()

What I want to change here is to train partC during the training of data2 with learning rate 0.0001

I planned to implement this code below but not sure if this is right.

optimizer1 = optim.Adam([partA, partB, partC], lr=0.03)
optimizer2 = optim.Adam({[partB], lr=0.03},{[partC], lr=0.0001})

for input, target in data1:
     optimizer1.zero_grad()
     optimizer2.zero_grad()
     # loss1 calculation with partA,B,C
     loss1.backward()
     
     input, target in data2:
     # loss2 calculation with only partB
     loss2.backward()
     
     optimizer1.step()
     optimizer2.step()

Especially, I’m not sure with the multiple optimizer part.

Thank you all!!

Hi,
Multiple optimizers shouldn’t be an issue.
Importantly, note that if you do not clear the gradients explicitly using zero_grad(), gradients from multiple backward calls are accumulated.

Hence, according to your pseudo code, after loss2.backward(), grad of B shall be the sum of gradients of loss1 and loss2 wrt B.

Also, from what you mention you want to calculate the gradient of loss2 as well wrt C. The pseudo code on the other hand mentions that C isn’t involved in the calculation of loss2.