I’m looking at a seq2seq model as described in this sample project: https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb
To train this network, two SGD optimizers are being initialized (one for decode, one for encode). In the train function (c.f. prompt 16), the loss function is calculated at the bottom of the full stack (i.e. after encode + decode) and then the optimizer is being steped for both.
Is this simply because there’s no way to initialize an optimizer with a union of different modules, or is there something more going on here that I’m not aware of?
Namely: if the encoder and decoder networks were in a ModuleList() property (i.e. both were inside a single model object), would it be sufficient/equivalent to have a single optimizer?