Does parameter adjustment scheme from 'Accurate, Large Minibatch SGD' work with PyTorch optim.SGD?

Andrew_Paint · September 14, 2019, 4:14am

I want to expand a 4-GPU training scheme to 8 GPUs. I’m wondering if I can use the adjustment rules from the paper “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour”. The scheme proposed in the paper is for Distributed synchronous SGD. Even though the optim.SGD in PyTorch is not distributed version, but I’m assuming it is a synchronous one when it comes to multi-GPU training. I’m not sure if it’s the right assumption.