Why multiply world size to intial lr under distributed training

111429 · February 28, 2021, 10:52am

In the official training code for video classification, the initial learning rate multiplies a world size factor.
Any reason behind that?
Thanks.