Training on Multi-GPUs out of Memory

When I was training my model using multi-gpus, I set the batch_size=8, and used two gpus. This setting cannot make the best of my machine(4 GPUs and each with memory of 12G), however, if I set the batch_size bigger or use more GPUs, it would raise exception(GPU out of memory). It seems like that, the first GPU takes more assignment, so when I set the batch_size bigger or used more GPUs, the first GPU was out of memory. How can I fix it?

The image show my GPU’s state. I used GPU 2 and 3.

This is commonly observed in multi-gpu setups, because some kind of aggregation has to be performed on one selected GPU (in the default case, cuda:0). See this answer, which explains the problem in a bit more details, and check the answer just after it for a possible solution (i.e. split the loss on multiple GPUs as well, not just the network).