Try to train in multi-gpu environment using distributed library but runs on only single gpu

the library i used is same as below:

from distributed import apply_gradient_allreduce
import torch.distributed as dist
from import DistributedSampler
from import DataLoader

my init_distributed function is following:

def init_distributed(args, n_gpus, rank, group_name):
assert torch.cuda.is_available(), “Distributed mode requires CUDA.”
print(“Initializing distributed”)
# Set cuda device so everything is done on the right GPU.
torch.cuda.set_device(rank % torch.cuda.device_count())
# Initialize distributed communication
backend=args.dist_backend, init_method=args.dist_url,
world_size=n_gpus, rank=rank, group_name=group_name)
print(“Done initializing distributed”)

when I make a training but the code works on only one gpu with two processes.
How can I fix this problem?


Not familiar with the torch.distributed thingy, but I’m pretty sure you also need to set your model to use multiple GPUs, otherwise it will default to only one GPU.

See the tutorials here.

I think it is due to the following -
torch.cuda.set_device(rank % torch.cuda.device_count())

Check the following:

  1. case to execute model on specific gpu
  2. case to execute on all available gpus