Hi,
I’m wondering if there’s a simple way to make a single optimizer product out of several optimizers (over disjoint parameters), such that the resulting optimizer product is a well formed instance of torch.optim.Optimizer.
The straightforward answer works well in simple cases, but fails when the optimizer product needs to be a well formed instance of torch.optim.Optimizer (e.g. when checkpointing or using frameworks such as pytorch lightning).
Yes, but not really. According to the docs, returning a list of optimizers is semantically different: It assumes that the optimizers are optimizing different objectives and should be run independently.
Lightning will call each optimizer sequentially:
for epoch in epochs:
for batch in data:
for opt in optimizers:
train_step(opt)
opt.step()
for scheduler in scheduler:
scheduler.step()