How to call step for seperate learning schedule?

John1231983 · April 1, 2019, 9:18pm

I have two networks: net1 and net2. They are different learning rate schedule but joint training.

net1_optimizer = torch.optim.Adam(net1.parameters()), lr=args.lr, betas=(0.5, 0.999))
net2_optimizer = torch.optim.Adam(net2.parameters(), lr = args.lr)
net1_lr_scheduler = torch.optim.lr_scheduler.StepLR(net1_optimizer, step_size=100, gamma=0.1)
net2_lr_scheduler = torch.optim.lr_scheduler.StepLR(net2_optimizer, step_size=100, gamma=0.1)

The net1 produces loss1 and net2 produces loss2. The total loss is

loss_total = loss1+loss2
loss_total.backward()

For step, should I call them separately for loss1 and loss2?

loss1_optimizer.step()
loss2_optimizer.step()

chenyuntc · April 1, 2019, 11:46pm

Yes, you should.

Another option is put all params in the same optimizer

John1231983 · April 2, 2019, 1:11am

So is it correct when I used

loss1_optimizer.step()
loss2_optimizer.step()

chenglu · April 2, 2019, 2:14am

It’s correct and necessary.

alex.veuthey · April 2, 2019, 6:25am

If the question is about the step method of the learning rate scheduler (in this case StepLR), then you should also call the schedulers’ step method, which is different from that of the optimizers.

Since you say that they have joint training, it might be preferable to use one optimizer with the parameters from both nets, but keep separate lr schedulers as you did, then call

net1_lr_scheduler.step()
net2_lr_scheduler.step()

as necessary (i.e. at each epoch or iteration, depending on how you want to count them. See the example here, which does not include the optimizer step call!

However, I can’t see how the schedules are different for both nets, by reading your code, since they have the same step size, gamma, and original learning rate. Did I misunderstand something?

John1231983 · April 2, 2019, 1:34pm

Thanks. I was mistake when copy paste. Actually, the first net used betas=(0.5, 0.999) while the second net use betas=(0.9, 0.999). That is reason why I used two seperated scheduler