No you don’t have to, but when you re-instantiate the model you need to specify correct GPU indices in the DataParallel module. If you serialized the whole DataParallel part, you need to take the .module attribute out, and re-wrap it in DataParallel again.
However, I was not using DataParallel module as I was doing model parallelism instead. I distributed different parts of my model into different GPUs using torch.nn.Module.cuda and copied tensors between GPUs using torch.Tensor.cuda. I got that runtime error when I loaded my model on a single GPU.
@heilaw can you give a small script to reproduce this?
Also, as we mention in the link you gave, we recommend using state_dict and load_state_dict to keep things simple.
Yes, that’s possible.
If you are training a multi-GPU model, you should store the model.module.state_dict() as explained here to remove the .module attributes, which might create errors when trying to load it back on a standard model.