I want to run 4 PyTorch scripts on a compute node with 4 GPUs in parallel. If I tell Slurm to use one GPU per script(I submit 4 jobs using one GPU each and let them run on one node in parallel), is it enough to just set device = “cuda” in the scripts? Does PyTorch automatically know which GPU to use or do I have to specify which of them to use?
If you make a single GPU visible via e.g. CUDA_VISIBLE_DEVICES
or via e.g. a docker
command you can use device="cuda"
inside your script.
Set CUDA_VISIBLE_DEVICES=0
for the first script, CUDA_VISIBLE_DEVICES=1
, for the second script, and so on. Otherwise all four scripts will use the same GPU.