Multi-optimizers performance

Does having Multiple optimizers assigned each for different module in the main model could have different learning path than having one optimizer for the full model ? for example Adam

it’s a interesting problem. and it need some experiments.
i think the learning path maybe different, because the optimizers compute the different formula when updating the parameter

@falmasri @DoubtWang
If you use Adam optimizer with indentical hyperparameters and calling the optimization step at the same time you will not see any numerical difference.

Adam optimizations step is done per parameter (moving averages are per parameter/independent).

For each parameter we have one state.

@spanev sorry, I misread it. my understand is that multiple optimizers in text refer to the different optimizers, e.g., SGD and Adam.

Sure, the question is a little bit ambiguous but this:

lead me to think that @falmasri was referring to same optimizer instances.

Having different mathematical optimizers remains a interesting problem! And it should be addressed with empirical observations, as you said. :slightly_smiling_face:

this is good point about adam parameters I didn’t pay attention.

I initiated 3 Adam optimizers and assigned each of them to different module and training was sequentially. this is that first module was run first and the updated using its assigned optimizers then the second and the third. It achieved a little bit enhancement in overfitting model.