How to concat data to minimize zero padding in data processing? How to do dynamic batch size?

For example, i have data like [1, 2, 3, 4, 5], [1, 2, 3], [1, 2]
Usually, I concat it into one tensor with zero padding.
Like [[1, 2, 3, 4, 5], [1, 2, 3, 0, 0], [1, 2, 0, 0, 0]]

But this should waste gpu utilization.
A better solution should be to contact short samples into one and add some other samples to maintain batch size.
Like [[1, 2, 3, 4, 5], [1, 2, 3, 1, 2], [1, 2, 3, 4, 0]]

Another thing is dynamic batch size for full utilization of gpu memory.
For example with max data len in batch 100 I should have 32 batch size and for max data len in batch 200 I should have 16 batch size.

But I not sure how to approach these problems.