Leave no GPU empty when pushing the data to device

I am training a ResNet on CIFAR10 dataset using 4 GPUs. When pushing the batches to GPUs, the data can be weirdly distributed, leaving one of the GPUs empty, resulting in TypeError: forward() missing 1 required positional argument: 'out' error. Specifically, if I select a batch size to be 4, 1 instance is pushed to each of the GPUs (no error). If I select a batch size of 5, data is distributed as 2, 2, 1, 0 (causes error). Batch size of 6 results in 2, 2, 2, 0 (causes error). Batch size of 7 results in 2, 2, 2, 1 distribution across GPUs (again no error). Why is this happening and how to force the data to be distributed in a way that no GPU is empty?