I’ve been trying to create a training script that can use both a single GPU or multiple GPUs for distributed training by setting nproc_per_node
to be equal to the number of GPUs being used. However, if I try nproc_per_node=1
, I get a runtime error RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
So do I need separate scripts for either scenarios, or is there a way to set up the script in such a way that it can do both?