Batch size of dataparallel

I’m not sure which setup would work the best, but I would try to set batch_size=4*32, such that each GPU gets a batch of 32 samples.
Your training will most likely differ a bit as described here.

PS: I’m not a huge fan of tagging people, since this might demotivate others to answer in this thread. :wink: