I have question regarding
DataParallel. When I use it to train a model on 2 gpus and then try to load the model only on gpu:0 for prediction I keep getting a lot of mismatch keys from the state dictionary. When I inspect the state dictionary I see that all params of the model are on gpu:0, so I don’t understand why I keep getting these mismatch keys?
Instead if I load the model for prediction with
DataParallel everything seems to work fine. My initial thought while getting mismatch keys was that maybe some parts of the model are on one gpu and others on the other, but that’s not the case since after loading the checkpoint and inspecting all params everything seems to be on gpu:0.