I realize there are similar questions particularly this one: Could someone explain batch_first=True in LSTM
However my question pertains to a specific tutorial I found online: https://medium.com/dair-ai/building-rnns-is-fun-with-pytorch-and-google-colab-3903ea9a3a79
I do not understand why they are not using batch_first=true.
I realized that the data being fed to the model is of the form [64, 28, 28].
Because they are working with MNIST data and they specified a batch size of 64
-64 is batch size
-28 (number of rows) is seq_len
-28 (number of cols) is features
Could someone please explain why they are not using batch_first=True?
Thank you very much for your help.
edit: removed a part of question for simplicity