In the official training code for video classification, the initial learning rate multiplies a world size factor.
Any reason behind that?
Thanks.
In the official training code for video classification, the initial learning rate multiplies a world size factor.
Any reason behind that?
Thanks.