How do I get the flag within my python script that I am passing to torchrun
? I want to set the number --nproc_per_node=32
I am passing there automatically rather than making sure the two scripts match (note I want to set the world size myself e.g. I am using cpu parallel jobs and want to choose that value myself thus)
related: Multiprocessing failed with Torch.distributed.launch module - #28 by Brando_Miranda