Loading model on GPU

It currently takes approximately 8-9 seconds to execute the following code:

        device = torch.device("cuda:0")
        self.model = self.model.to(device)
        for parameter in self.model.parameters():
            parameter.requires_grad = False

The actual inference time is only 3 seconds, where the model segments a structure on 74 2d images of size 512x512.

Is there any way to speed up the model loading on the GPU?

If your state_dict contains CPU tensors, you’ll trigger the host to device copy twice: first for the model parameters, then again for the state_dict.
If that’s the case, you could first restore the model and push it only once to the GPU.

How often are you resetting the model that you worry about the performance?
Usually you’ll setup the model once and the training will take much longer than the initialization.

Thanks for your response ptrblck!
This is actually part of the inference, and not training.
Originally the entire model was in a .tar format and was being loaded using torch.nn.DataParallel. But since we only had 1 GPU and are trying to cut down as much time as possible, I re saved the checkpoint only using the following:

checkpoint = torch.load(self.validate_model)
torch.save(self.model.module.state_dict(), path)

And then loading this newly saved state dict for inference without using torch.nn.DataParallel.

Is this not the right way to approach this? I am new to PyTorch.


Your approach sounds right. However, you would only need to load the model once during the startup of your inference application or are you recreating the model for each prediction?