@SteveXWu Thanks for posting.
If you save your DataParallel model and want to load it in a different env, you need to properly define the mapping of devices using the map_location
option in torch.save/load, you can check the following pointers and see if they resolve your issue.
https://pytorch.org/tutorials/intermediate/ddp_tutorial.html#save-and-load-checkpoints
btw, we recommend using DDP instead of DataParallel Distributed Data Parallel — PyTorch 1.11.0 documentation
You can also check this similar issue Load DDP model trained with 8 gpus on only 2 gpus? - #12 by kazem