How to run training on 3 GPUs with 2^n batch_size

I encounter errors when I attempt to use a batch size that is not divisible by 3. Is there an option that allows the last GPU to handle the remaining part of the batch?

Example

batch_size=32
can be divided as

GPU1 : 10 (batch_size//3)
GPU2 : 10 (batch_size//3)
GPU3 : 12 (batch_size//3+r)

The DistributedSampler should take care of “padding” the dataset creating equal splits by repeating previous samples.

1 Like

thanks for your feedback