MultiheadAttention / transformer with batch first

Is it possible to add option batch_first = True for MultiheadAttention and transformer modules like it is done for RNN ?
Batch x Sequence x Embedding
I find it more understandable and intuitive when batch is first in the input


I can’t agree with u anymore!

We have an issue here so feel free to post your opinion there.
I’m not sure, why it was closed.

1 Like

yes, we can do this. i have tried this and successfully implemented the codeBut I i face one problem, i can only run code on the CPU, while using the cuda compiler shows error regarding batch first command.