How mini-batches work for nn.LSTM?

MC95 · January 14, 2019, 1:18pm

Hey everybody,
I’m pretty new to Pytorch and I’m struggling at the moment with the mini-btach parameter for LSTM

Imagine I have the following sequenz:

input = Variable(torch.Tensor([1,2,3,4,5,6,7,8,9,10]))

As far as I know, using one batch I can do:

input.view_(10, -1, 1)

If I would pass this to LSTM, i would first get the h_t for the input 1, then 2, then 3 and so on, right?

But what if I shape the input like this:

input.view_(-1, 2, 1)
# it is somethin like: [ [[1],[2]], [[3],[4]]...]

How does the LSTM work on that input. The first shape parameter would be 5 now, but actually I still have a seq_length of 10 (1-10) ??

Thanks for explaining

richard · January 15, 2019, 10:21pm

input.view_(-1, 2, 1) would give an input of size (5, 2, 1) so it would be interpreted as a batch of two sequences each of length 5.

MC95 · January 16, 2019, 9:12am

Thanks for your reply. Ok I see, makes sense.
But If I want to learn the sequence 1-10 (simplified), this approach wouldn’t make sense right?
Because I would learn the sequenz 1,3,5,7,9 and 2,4,6,8,10 ?

richard · January 16, 2019, 5:02pm

Yes I agree that it wouldn’t make sense.

MC95 · January 16, 2019, 6:30pm

Can you give me an example, where such architecture would make sense?:)