How to make the memory balanced when training on multi-GPUs

I try to train a model by using multi-GPUs on single machine. But I observe that one of the GPUs (usually the first one) has very high GPU memory while others are in low-memory states. How can I make it balanced?


