Hi, I’m loading pretrained model for training, and I specified the gpu id for training is 1, but during the training, I saw the gpu 0’s memory is also occupied. Someone could explain? Thank you
Most likely the CUDA context is also initialized on the default device.
You could mask all devices and only pass the wanted one via:
CUDA_VISIBLE_DEVICES=1 python script.py args
1 Like
Thankyou ptrblck, I get this
THCudaCheck FAIL file=../torch/csrc/cuda/Module.cpp line=37 error=101 : invalid device ordinal
Just share here in case others encounter the same error.
If set
CUDA_VISIBLE_DEVICES =3
then in the train.py code, the gpu_id should be set to 0, not to 3.
1 Like