I have question regarding DataParallel. When I use it to train a model on 2 gpus and then try to load the model only on gpu:0 for prediction I keep getting a lot of mismatch keys from the state dictionary. When I inspect the state dictionary I see that all params of the model are on gpu:0, so I don’t understand why I keep getting these mismatch keys?
Instead if I load the model for prediction with DataParallel everything seems to work fine. My initial thought while getting mismatch keys was that maybe some parts of the model are on one gpu and others on the other, but that’s not the case since after loading the checkpoint and inspecting all params everything seems to be on gpu:0.
Thank you, Juan
that’s refreshing to know especially when these things are not reflected in any documentation.
If you don’t mind me asking do you have any idea on what is the proper usage of distributed.DataParallel?
For instance here I’m explaining how I’ve used it and still end up getting error during execution.