Question about Slurm and PyTorch

Stev1 · April 30, 2025, 10:52pm

I want to run 4 PyTorch scripts on a compute node with 4 GPUs in parallel. If I tell Slurm to use one GPU per script(I submit 4 jobs using one GPU each and let them run on one node in parallel), is it enough to just set device = “cuda” in the scripts? Does PyTorch automatically know which GPU to use or do I have to specify which of them to use?

ptrblck · May 1, 2025, 12:02pm

If you make a single GPU visible via e.g. CUDA_VISIBLE_DEVICES or via e.g. a docker command you can use device="cuda" inside your script.

Bjorn_Lindqvist · May 7, 2025, 11:39am

Set CUDA_VISIBLE_DEVICES=0 for the first script, CUDA_VISIBLE_DEVICES=1, for the second script, and so on. Otherwise all four scripts will use the same GPU.