Optimizers and multiprocessing: should I manually average gradients?

jerinphilip · June 14, 2019, 2:36pm

I have a similar question here. I simultaneously opened a query in pytorch/fairseq#779 to which the response was that there is built in averaging.

How about trying some black box experiments to figure out?