PyTorch Forums
Training hangs on loss.backward() with DDP --nnodes=2 --nproc_per_node=3
distributed
ptrblck
May 18, 2025, 2:52pm
2
Did you make sure the same number of batches is used on each rank as described
here
?
show post in topic