Hi, assume that I’ve choose the batch size = 32 in a single gpu to outperforms other methods. Now I want use dataparallet to split the training data. I have 4 gpus. To get the same results, should I use batch size = 8 for each gpu or batch size = 32 for each gpu?
Thanks a lot.
Thanks a lot
I’m not sure which setup would work the best, but I would try to set
batch_size=4*32, such that each GPU gets a batch of 32 samples.
Your training will most likely differ a bit as described here.
PS: I’m not a huge fan of tagging people, since this might demotivate others to answer in this thread.
Hi, But my single gpu cannot use batch size=32 which is out of memory. The single gpu limit the batch size <= 8. So I want to know when I use 4 gpus and each of which use batch size = 8 to train the model. is it right? Thanks a lot.
I misunderstood your question and thought
batch_size=32 was chosen, because this setup outperforms other models.
Sure, you could try to split the 32 samples among the GPUs. If you encounter any problems, e.g. in the
BatchNorm layers (since the sample size is now smaller), you could try to use NVIDIA’s apex SyncBatchNorm. Note that the default device (usually
cuda:0) might use some more memory if you use
Wow, Thanks a lot. I got it.