[data paralled] RuntimeError: Gather got an input of invalid size: got [2, 4, 6, 300, 128], but expected [2, 5, 6, 300, 128]

I’m training a language model using gpt2, when i used multi-gpus
RuntimeError: Gather got an input of invalid size: got [2, 4, 6, 300, 128], but expected [2, 5, 6, 300, 128]
gpus=2, the batch_size=9. It seems the two gpu get different size of bath_size:one is 4 , the other is 5.
how can i fix this.