Padding variable length input works reasonably well on CPU (haven’t tried GPU yet). Here are a few examples with “dynamic batching”. Basically for batch that looks like this:
[[0, 0, 1, 1], [1, 1, 1, 1]]
Batch size at time steps 0 and 1 will be 1, and at time steps 2 and 3 will be 2.
I was surprised that dynamic batching was slower. That being said, there is some tricky indexing and concatenations that might have a nicer implementation.