Training Seq2Seq models

Is there a reason the why the encoder and decoder part of the seq2seq model is defined in separate classes and separate optimizers are used for them? I have build and trained in both encode and decode in same class - the loss is not decreasing - but when I tried to use two separate optimizers, the loss is decreasing well. Is there any difference in grad computation in the later?