I am using some gpu cluster to train my model. I am using the
torch.distributed module to manipulate the two gpus on one node. I set the dataloader like this:
ds = CityScapes(cfg, mode='train') dl = DataLoader(ds, batch_size=8, shuffle=False, sampler=sampler, num_workers=4, pin_memory=True, drop_last=True)
This means that I will open 4 processes on each gpu to loader the data. Considering the process that is used to train the model, on this node with 2gpus, there would be 2(1+4)=10 processes in this node. So shall I start my training like this?
srun -p gpu --gres=gpu:2 --ntasks_per_node=10 python -m torch.distributed.launch --nproc_per_node=2 train.py
Is this the correct way to make it work please?