Can anyone please explain how RNNs process batches of different sizes?
First batch was 128 samples, hidden vector was 128*DIM size
Second batch was 64 samples. hot to process in with previous hidden size of 128?
How to process addition of different batch sizes inside?
The common approach would be to pad the sequence length to the max length or to create batches with samples using the same sequence length. PyTorch has an experimental
NestedTensor support, but I don’t think RNNs support it.
@ptrblck Hi! Probably i was not so clear.
I understand how to process batches of different lenght, I mean how LSTM under the hood process batches of different batch sizes? Because size of hidden vector depends on Batche size.
If we processed tensor with BS=128, hidden vector would be 128 x Num_dimentions, but how to process if the next tensor has shape 64 x Num_dimentions?
Each gate has addition operator, how we can add to tensors of different batch size?
All the LSTM layer cares about, from a functional standpoint, is that the batch size of the inputs, the hidden_state and the cell state match. If they don’t, you’ll get an error.
Finnally Understood, that hidden_state resets to zeros for every batch)