For example, i have data like [1, 2, 3, 4, 5], [1, 2, 3], [1, 2]
Usually, I concat it into one tensor with zero padding.
Like [[1, 2, 3, 4, 5], [1, 2, 3, 0, 0], [1, 2, 0, 0, 0]]
But this should waste gpu utilization.
A better solution should be to contact short samples into one and add some other samples to maintain batch size.
Like [[1, 2, 3, 4, 5], [1, 2, 3, 1, 2], [1, 2, 3, 4, 0]]
Another thing is dynamic batch size for full utilization of gpu memory.
For example with max data len in batch 100 I should have 32 batch size and for max data len in batch 200 I should have 16 batch size.
But I not sure how to approach these problems.