Torch.load did not use the gpu specified for training

Hi, I’m loading pretrained model for training, and I specified the gpu id for training is 1, but during the training, I saw the gpu 0’s memory is also occupied. Someone could explain? Thank you

Most likely the CUDA context is also initialized on the default device.
You could mask all devices and only pass the wanted one via:

CUDA_VISIBLE_DEVICES=1 python script.py args
1 Like

Thankyou ptrblck, I get this

THCudaCheck FAIL file=../torch/csrc/cuda/Module.cpp line=37 error=101 : invalid device ordinal

Just share here in case others encounter the same error.
If set

CUDA_VISIBLE_DEVICES =3

then in the train.py code, the gpu_id should be set to 0, not to 3.

1 Like