Is there a reason to use two optimizer?

Is there a reason to use two optimizer instead of one for both encoder and decoder in the following tutorial?

http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#training-the-model

2 Likes

In this case, it is equivalent to use one. But multiple optimizers are useful when you either want different optimization algorithm for different parts or optimize different set of parameters at each time. :slight_smile:

2 Likes

Separate optimizers will also allow you to have separate schedulers, in case e.g. you are fine-tuning a pretrained model, and you want to slowly ramp-up the learning rate for the low-level (first) part of the model, while the final part will have a higher learning rate from the beginning.