Cannot run DDP on 2 GPUS

Hi

I am trying out a gpt-like tool here

But i cannot get it to run on multiple gpus. It has the funcitonality to run on multiple gpus (and nodes even) but with the command torchrun --standalone --nproc_per_node=2 train.py i get an error

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group

I can run it on a single gpu though.
What can i do? I am new to pytorch. The code that implements ddp is like the following

ddp = int(os.environ.get('RANK', -1)) != -1 # is this a ddp run?
# ddp = True
if ddp:
    # init_process_group(backend=backend)
    ddp_rank = int(os.environ['RANK'])
    ddp_rank = 0
    ddp_local_rank = int(os.environ['LOCAL_RANK'])
    ddp_local_rank = 1
    ddp_world_size = int(os.environ['WORLD_SIZE'])
    ddp_world_size = 2
    device = f'cuda:{ddp_local_rank}'
    torch.cuda.set_device(device)
    master_process = ddp_rank == 0 # this process will do logging, checkpointing etc.
    seed_offset = ddp_rank # each process gets a different seed
    #assert gradient_accumulation_steps % torch.cuda.device_count() == 0
    gradient_accumulation_steps //= torch.cuda.device_count()
else:
    # if not ddp, we are running on a single gpu, and one process
    master_process = True
    seed_offset = 0
    ddp_world_size = 1

Btw i have commented out the assert part. Also, as i understand the value are passed as environment variables?

Thanks

The DDP tutorial might be a good starter.

Ok I’ll look into it
Thanks