Cuda Check - How to check if the device is available before assigning task

Anandh_Perumal_Konar · October 24, 2019, 8:39pm

model.to(device) or nn.DataParallel(model) moves the model to the GPU but is there a way to check if the particular device is already in use by any other process.
The actual problem is when some else is training something on the server I don’t want my program to disturb their process and my program to fail so that I can allocate some other resource.

albanD · October 25, 2019, 2:48pm

Unfortunately it is very hard to do a general check like this as it will depend a lot on how your manage your cluster with coworkers.
You can use nvidia-smi to check memory usage and/or process running on the different GPUs potentially.
You can check this small tool for example on how to do such thing and adapt it to your needs.

Anandh_Perumal_Konar · October 25, 2019, 4:13pm

nvidia-smi is more like a manual job.
waitGPU is very helpful will little changes it should solve my problem.
Thanks a lot.