Hi guys, Thank you so much for the support on here. I don’t know parallel or distributed computing so please excuse me if my question is naive.
I will use HPC(high performance computer) for my research and I don’t know about parallel or distributed computing. I really don’t understand the
DistributedDataParallel() in pytorch. Especially
init_process_group() . What is the meaning of initializing processes group? and what is
init_method : URL specifying how to initialize the package.
for example (I found those in the documentation):
What are those URLs?
What is the Rank of the current process?
world_size the number of GPUs?
It would be really appreciated if someone explained to me What is and How to use
init_process_group() in a simple way because I don’t know parallel or distributed computing.
I will use things like Slurm(sbatch) in the HPC.