Lstm expected dimensionality

Mika_S · January 14, 2018, 10:57am

Is there a reason why pytorch.lstm follows seq_size x batch_size x dim by default instead of the usual batch_size x …

richard · January 14, 2018, 5:34pm

You can have an LSTM take batch_size x seq_size x dim using the batch_first option: http://pytorch.org/docs/master/nn.html?highlight=lstm#torch.nn.LSTM

Mika_S · January 15, 2018, 1:13am

I saw that.
But my question was mainly why is the default a convention?

richard · January 16, 2018, 3:42pm

I can’t find it right now in the cudnn docs, but I think cudnn (the underlying library that is called for lstm) takes batch_size as the second dimension. Presumably this is done for efficiency for the computation, but I don’t know the details.

jpeg729 · January 16, 2018, 3:51pm

I tried setting up my data as (sequences, timesteps, features) and then selecting timesteps 1…100 for the first batch, then 101…200 for the second batch, but pytorch complained that the data wasn’t contiguous.

So I changed my dataloader to produce data as (timesteps, sequences, features) and the problem went away.

If timesteps is the first dimension then you can run sequential batches efficiently without having to copy the data.