Loading GPU trained checkpoint to cuda device doesn't utilise GPU for testing

Steven_Tuften · August 9, 2022, 7:12am

Have trained a U-Net model for image segmentation on my win64 environment with:
NVIDIA RTX 2060
Python 3.8.13
CUDA Version: 11.3
PyTorch Version: 1.12.1
PyTorch Lightning Version: 1.5.10

I saved the checkpoint files and the training fully utilised my GPU.
It took about 2hrs per epoch to train the model.

I then loaded my model from the final checkpoint and used map_location=torch.device(‘cuda’) .
I then ran the test , using pytorch lighting Trainer.test().

Estimated time for 1 epoch 58hrs.
The GPU Cuda is 0% but the copy area on the card is at 80% constantly.

Not sure what is happening.
Probably a newbie issue but any insight most appreciated!

JuanFMontesinos · August 9, 2022, 8:18am

Hi, the map location only specify where the weights are loaded. When you isntantiate the model, it uses those or a copy of those depending on where the model is allocated.

If you are using plain pytorch you should allocate the model in the gpu as:
model=model.cuda()
if you use pytorch lightning you have to pass the same kwarg as for training when you build the trainer.
This is gpus=1 (or more)
Also note that if you use pytorch lightning, they handle the model loading by passing
ckpt_path=...

Steven_Tuften · August 9, 2022, 8:49am

Thankyou Juan. Appreciate the quick response.

I’m passing the checkpoint file location to the trainer and the same kwargs …however I’ll double check all the args just to be sure!