Joined Optimizers

Hi,
I’m wondering if there’s a simple way to make a single optimizer product out of several optimizers (over disjoint parameters), such that the resulting optimizer product is a well formed instance of torch.optim.Optimizer.

The straightforward answer works well in simple cases, but fails when the optimizer product needs to be a well formed instance of torch.optim.Optimizer (e.g. when checkpointing or using frameworks such as pytorch lightning).

pytorch lighting natively supports multiple optimizers. you just need return a list of optimizers…

Yes, but not really. According to the docs, returning a list of optimizers is semantically different: It assumes that the optimizers are optimizing different objectives and should be run independently.

Lightning will call each optimizer sequentially:
for epoch in epochs:
   for batch in data:
      for opt in optimizers:
         train_step(opt)
         opt.step()

   for scheduler in scheduler:
      scheduler.step()

Here is the relevant code (I think).

i.e. the forward pass is running once per optimizer.

Hi guys,

Regarding two optimizers and two losses:
I need to train two networks to learn the coordinates (translation + rotations). Would this work?

optimizer_Trans = optim.Adam(parameters_Trans_to_train, lr)
model_lr_scheduler_Trans = optim.lr_scheduler.StepLR(
optimizer_Trans, scheduler_step_size, 0.1)

optimizer_Rotat = optim.Adam(parameters_Rotat_to_train, lr)
model_lr_scheduler_Rotat = optim.lr_scheduler.StepLR(
optimizer_Rotat, scheduler_step_size, 0.1)

model_lr_scheduler_Trans.step()
model_lr_scheduler_Rotat.step()

for batch_idx, inputs in enumerate(train_loader):

loss_Trans, loss_Rotat = process_batch(inputs)

optimizer_Trans.zero_grad()
loss_Trans.backward(retain_graph=True)
optimizer_Trans.step()

optimizer_Rotat.zero_grad()
loss_Rotat.backward(retain_graph=True)
optimizer_Rotat.step()