Best strategy for training when certain weights cannot be updated for hours

This isn’t really a PyTorch-specific question, but I am using PyTorch and thought I would ask here.

I should also point out that it is more of a worry I have at this point than something I have empirically validated to be a serious problem.

I have a model that is trying to produce speech for many different speakers. I train an embedding for each speaker. However, because there are so many speakers, a speaker embedding can go hours with no training. I am worried that it could get really out of sync with what has happened to the weights in the model. Is there standard advice for dealing with this situation?

One thing I was thinking about was that when I start training on a new speaker, for some initial lockout period to train the speaker embedding only and not the rest of the model (but this wastes time, and it could be tricky to set the learning rate correctly, and adaptive techniques like ADAM might not perform as expected in a scenario where there are temporal discontinuities between training steps). Another thought I had is that it might mitigate the issue to stagger speaker changes (but that would make the loading code more complex). Another possible mitigator is that early in training, when things are moving most quickly, I could change speakers very rapidly, even if that is not optimal for other reasons (like it has more overhead or makes it impossible to learn longer term dependencies).

This is a tricky situation. However, if you have a large set of speakers, then on-average you have an “expired” embedding. So the network will likely regularize for this effect implicitly. I’m not convinced that you’d need to do anything explicitly.