I want to update the nccl timeout from 30 minutes to 5 minutes after step 1
torch.distributed.init_process_group(
backend="nccl",
world_size=world_size,
rank=rank,
timeout=timedelta(minutes=30), # 30 is the default but be explicit
)
Is that possible somehow? Maybe there is an os.environ trick?