In pytorch DistributedDataParallel 1 process using 2 gpu's (i.e. Processes GPU from nvida-smi command) though using torch.cuda.set_device(local_rank)

Arun_Kumar2 · September 9, 2021, 1:01pm

I am using the following code snippet to ensure each task/process(local_rank) generated by slurm (using #SBATCH -n 4) is assigned to a specific local gpu:

device = torch.device("cuda",local_rank)
torch.cuda.set_device(device)

Still, when I look at nvidia-smi on the “Process GPU” column I see two different GPU’s (say 0, and 1), assigned to only a single PID. If I tie a slurm task to a GPU using torch.cuda.set_device(device) (where device here is the local_rank) I should not have multiple GPU allocations right?

Not sure if this is important but I am using two workers on my DataLoader (see definition below), but then that should allocate the process to same GPU (say 0) right?

    trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, num_workers=2,sampler = torch.utils.data.distributed.DistributedSampler(dataset=trainset,num_replicas=world_size,rank=world_rank))

Details:

Slurm config:

#SBATCH --nodes 2
#SBATCH -n 4
#SBATCH --gres:gpus=4

Expectation:

Four tasks to be launched across two nodes. And each task to have one GPU allocated (because inside program launched by each task/process I have set torch.cuda.set_device(device))

Observed:

PID 6173 for example is allocated to two gpus (0 and 1). The 4th task moved to the other node; so please ignore the same.

GPU	PID	Type	Process name	Usage
0	6172	C	…bin/python	2657MiB
0	6173	C	…bin/python	673MiB
0	6174	C	…bin/python	673MiB
1	6173	C	…bin/python	2657MiB
2	6174	C	…bin/python	2657MiB

pritamdamania87 · September 14, 2021, 1:09am

Are the dataloaders doing any GPU work? If not, that should not be relevant here.

Still, when I look at nvidia-smi on the “Process GPU” column I see two different GPU’s (say 0 , and 1 ), assigned to only a single PID.

My guess is that this might just be some book keeping allocations happening on GPU 0 from all the processes (ex: initializing some cuda contexts etc.). Could you share a minimal script to repro this and we can figure out what might be happening here.

Arun_Kumar2 · September 14, 2021, 4:01pm

Thanks for this. Putting together a minimial script to repro and will get back soon.