Lstm expected dimensionality

Is there a reason why pytorch.lstm follows seq_size x batch_size x dim by default instead of the usual batch_size x …

You can have an LSTM take batch_size x seq_size x dim using the batch_first option:

I saw that.
But my question was mainly why is the default a convention?

I can’t find it right now in the cudnn docs, but I think cudnn (the underlying library that is called for lstm) takes batch_size as the second dimension. Presumably this is done for efficiency for the computation, but I don’t know the details.

I tried setting up my data as (sequences, timesteps, features) and then selecting timesteps 1…100 for the first batch, then 101…200 for the second batch, but pytorch complained that the data wasn’t contiguous.

So I changed my dataloader to produce data as (timesteps, sequences, features) and the problem went away.

If timesteps is the first dimension then you can run sequential batches efficiently without having to copy the data.

1 Like