Why multiply world size to intial lr under distributed training

In the official training code for video classification, the initial learning rate multiplies a world size factor.
Any reason behind that?
Thanks.