Use not all GPUs in each server

Hi,
I have two servers with 3 GPUs each. I can run my code when I use all GPUs on servers (6 GPUs). I want to make benchmark by using 2 GPUs on each (4 GPUs) and 1 GPUs on each server (2 GPUs).

ngpus_per_node = 1 # or can be 2 or 3
args.world_size = ngpus_per_node * args.world_size # 2  (for 2 machine) is sent to for world_size

when I use all GPUs on each machine it works fine, but by less than it the code stuck in following line without any error:

model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[gpu])

You could use os to set which device python could use.

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
1 Like

Or you can set environment variables before running the code.

CUDA_VISIBLE_DEVICES=0,1 python train.py

Set CUDA_VISIBLE_DEVICES to the gpu index you want to use.

1 Like