Efficient way to train on GPU and inference on CPU

Hi everyone, I am wondering what is the best way to load the model and start making inference on CPU after training the model on GPU:
What I am doing which is working fine but seems inefficient is as follows:

1- Load the data
2- Define, data loader
3- Define network architecture
4- Train the model
5- Save the model using

torch.save(self.net.state_dict(), save_path)

Then when I go to inference and prediction on CPU, I do the following:

1- Load data
2- Define dataloader
3- Define network
4 - Load the model and checkpoint
5- Prediction on new data

model.load_state_dict(torch.load(path,map_location=‘cpu’))

I tried to do the prediction without steps 1-3 since I thought the saved model and checkpoint has all information but then I got an error.
I am wondering if there is any way to make this procedure more efficient and faster.

This is expected since you are only storing the state_dict and thus need to create a new model object. You would script the model via torch.jit.script and load it directly, but note that TorchScript is in maintenance mode. I don’t think the torch.compile export mechanism is done yet but @marksaroufim could correct me.
Generally, you could also check e.g. TorchServe or other serving solutions for inference.

1 Like

Thanks, I thought about this for a while, I am wondering for the prediction, do I have to run the entire train data in my model, what would happen if I only load 10% data and then load the checkpoint and start prediction using new data.

For inference you don’t need to load the training data at all and should just load the samples you want to classify.