For beginners: Do not use view() or reshape() to swap dimensions of tensors!

Thanks for the elaboration on this matter. I was struggling with this issue for 6 months in the end!

There is still another use-case left. What if your training data is flat in the first place (batch_size, data)?
This scenario could come from a visual encoder. Its result could be fed into an LSTM and therefore needs to be shaped accordingly.

A reshape to (num sequences, sequence length, data) is just fine, but the other way around to a shape of (sequence length, num sequences, data) is causing the troubles that you describe.

So what would be good way to reshape the tensor to sequence length first in this case?