How do I test my parallel trained model on one gpu?

wanchaol · March 22, 2022, 4:59am

@SteveXWu Thanks for posting.

If you save your DataParallel model and want to load it in a different env, you need to properly define the mapping of devices using the map_location option in torch.save/load, you can check the following pointers and see if they resolve your issue.

https://pytorch.org/tutorials/intermediate/ddp_tutorial.html#save-and-load-checkpoints

btw, we recommend using DDP instead of DataParallel Distributed Data Parallel — PyTorch 1.11.0 documentation

You can also check this similar issue Load DDP model trained with 8 gpus on only 2 gpus? - #12 by kazem