Why is Sequence,Batch,Features the default (instead of BxSxF)?

I’d love it if somebody could explain why you’d ever have your sequence data structured as Sequence,Batch,Feature?
To me it seems much more natural to always order your data as Batch,Sequence,Feature.

Perhaps I don’t understand correctly - here is an example.

Let’s say my batch has 2 samples, cat and mouse.

As Sequence,Batch,Features:

[c,m]
[a,o]
[t,u]
[-,s]
[-,e]

As Batch,Sequence,Features:

[c,a,t,-,-]
[m,o,u,s,e]

Are my depictions correct, and if so, why would you use SxBxF?

Thanks

2 Likes

I didn’t know the answer either, so i asked @jekbradbury. He replied:

often there’s a need for an explicit for loop over time, and sometimes it’s nice if the chunks used in the loop are contiguous
but if those things aren’t actually the case in a particular problem then it doesn’t make a difference

2 Likes

And Kyle Kastner further adds:

Seconding this - it means each step of your RNN (if you have a step() type function + loop) looks like a standard feedforward network without any extra work. I think in numpy C contiguous conventions it also has memory access benefits as your least accessed dimension is on the “farthest” axis
It also means if you are doing convolutional sequence processing you can have seq - n - c - h - w a little bit easier

1 Like

Thanks - I think I understand. I’m guessing that in sequence-first layout, you’d be running the RNNs of each minibatch sample in parallel. It reminds me now of SIMD lanes:

Each column here is like a SIMD lane:

[c,m]
[a,o]
[t,u]
[-,s]
[-,e]