Should I manually average gradients when training with multiprocessing?

I really don’t know where else to ask but here, and could not find an answer already on the forum.

Do optimizers work transparently in multiprocess runs or do I need to average the gradients of each process manually?

The imagenet example in the pytorch/examples repo does not do explicit gradient averaging between processes, but the example on distributed training in pytorch’s tutorials does.

Thanks in advance for any help!