If I don't have enough gpus, how to change the learning rate and decay steps?

lity · March 14, 2019, 1:57am

I want to reproduce some experiment results. However, many of them use 8 gpus and I only have 4 gpus or less. So how to change the lr、decay steps and total epochs？
For example, I use 4 gpus. Perhaps I should make new_learning_rate = learning_rate / 2，and the decay steps as well as total epochs should multiply by 2？？？？ I think it depend on how to realize the method of gradient descent， but I don’t know how to do it.

LeviViana · March 14, 2019, 8:34am

The training strategy can be adapted in a linear fashion in terms of the size of the batch. So, if you want to reproduce the results of a paper that used let’s say 8x V100 (i.e. 16Gb per GPU) and you have say 4x 1080Ti (11Gb per GPU), in some cases, the paper might have used more training samples per GPU than you. If you consider for instance the training of some Mask-RCNN architectures, you will have 16 images per batch (i.e. 2 images/GPU) on a 8x V100 config, but probably you will only have 4 images per batch on a 4x 1080Ti setup. Thus, in this case, you will have to:

Divide the lr by 4
Multiply the number of steps by 4x (including those in the decay steps)

If you are really interested on how this kind of scheduling works and how it can affect you training accuracy and speed, you might find interesting reading this article.