Pytorch GPU question

Hi All,

Just wanted to ask

I do device=torch.device('cuda' if torch.cuda.is_available() else 'cpu') or device=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') and then just move my data, tensors I want to use and my model to the device by doing .to(device) I am only using 1 GPU?


Yes, both approaches would use a single GPU only.

Thanks. Would this still be the case if I had multiple GPUs available to pytorch?

Yes, this would still be the case and you could use nn.DataParallel or nn.DistributedDataParallel to use multiple devices.
This tutorial gives you more information.

Thanks @ptrblck. Assuming I have at least 1 GPU, by default would the same GPU be used in both approaches?

If your system has only a single GPU, I think both approaches should fall back to using the single device.

Thanks @ptrblck. (referring to my original question) Just to confirm would the 2 approaches I specified at the top of the thread only use 1 GPU even if torch.cuda.device_count() gave a number greater than 1?

Yes, if you are explicitly moving the model and data to a specific device, only this device will be used.
That will be the case as long as you don’t use e.g. nn.DataParallel.
Note that (depending on your code) PyTorch might create a CUDA context on other visible devices.
If you see this behavior and want to avoid it, you could mask the desired device via CUDA_VISIBLE_DEVICES=0 python args.

Thanks @ptrblck (again referring to my original question) if I have multiple GPUs on the system I execute my code, the 2 approches only use 1 GPU?

Also what exactly does a CUDA context mean and is is something I should avoid?


The CUDA context stores the GPU kernels, runtime etc., and thus uses memory on the specified device.

Thanks @ptrblck. Should cuda contexts be avoided?

No, as it’s holding the CUDA kernels. If you don’t initialize it, you won’t be able to run code of your GPU.

Thanks @ptrblck. So having cuda contexts on GPUs I don’t move anything to are fine?