I’m currently facing a weird behaviour which I cannot explain. I’m training a vgg16 on svhn and training on 1 gpu with SGD and fixed hypeparams I get the following nice results:
Now trying to train the same model with same optimizer and hyperparams as before but using
DataParallel it exhibits the following behaviour where the learning process actually stagnates and it doesn’t learn anything.
Even more weird is the fact that if I swap vgg16 for resnet50 it starts learning again.
Anyone has any insights on this, or what it might be going on?
I would expect that if a model
M trained on one device with fixed optimizer and hyperams exhibiting good learning behaviour to have the same behaviour when trained with
DataParallel using same optimizer and hyperparams.