Hi,
I really don’t know where else to ask but here, and could not find an answer already on the forum.
Do optimizers work transparently in multiprocess runs or do I need to average the gradients of each process manually?
The imagenet example in the pytorch/examples repo does not do explicit gradient averaging between processes, but the example on distributed training in pytorch’s tutorials does.
Thanks in advance for any help!
Enrico