Load a model and use multiple GPUs at inference

I have a model that I train on multiple GPUs, and then use it for inference. If I do training and inference all at once, it works just fine, but if I save the model and try to use it later for inference using multiple GPUs, then it fails with this error:

RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

So, this works fine:

  • Create model
  • Train model on multiple GPUs
  • Use model for inference using multiple GPUs

This fails at inference:

  • Create model
  • Train model on multiple GPUs
  • Save model using torch.save(model, path)
  • Load model using torch.load("path", map_location=torch.device("cpu")), also fails if I don’t use map_location
  • Use model for inference using multiple GPUs (I don’t forget to use to(device))

What am I doing wrong?

Could you explain how multiple GPUs are used in the working cases?
Are you using DistributedDataParallel or any manual model sharding?

Thank you for your answer.
No, I use if torch.cuda.device_count() > 1: model = nn.DataParallel(model) for the multiple GPUs part in both scenarios. In fact the code is exactly the same, it’s just that if the model is not already trained, then I train and saved it.
I should precise that I work on a cluster made of nodes containing 4 GPUs each. I did the test requesting and using 2 rtx2080.

OK, strange. Could you post a minimal, executable code snippet showing the failure, please?
The model definition as well as some random input tensors might be sufficient to reproduce the issue.

So, I fixed the issue, or at least I did what was “recommended”.

When saving the model, I was saving the entire model torch.save(model, path), not the state dict torch.save(model.state_dict(), path). And now I can use multiple GPUs or not for training, save the model, load it, and use multiple GPUs of not for inference.

However, it seems that saving the entire model is bugged.

I had to train another model, completely different architecture. The exact same thing happened, with the same solution: if I save the entire model, I cannot use multiple GPUs after, but if I save the state dict, then everything is fine.