Multiprocessing failed with Torch.distributed.launch module

Brando_Miranda · February 6, 2022, 4:26pm

@smth How do I get the flag within my python script that I am passing to torchrun? I want to set the number --nproc_per_node=32 I am passing there automatically rather than making sure the two scripts match (note I want to set the world size myself e.g. I am using cpu parallel jobs and want to choose that value myself thus)