MultiheadAttention / transformer with batch first

Dee · August 23, 2019, 8:50pm

Is it possible to add option batch_first = True for MultiheadAttention and transformer modules like it is done for RNN ?
Batch x Sequence x Embedding
I find it more understandable and intuitive when batch is first in the input

19951024 · August 25, 2019, 7:27am

I can’t agree with u anymore!

ptrblck · August 25, 2019, 12:14pm

We have an issue here so feel free to post your opinion there.
I’m not sure, why it was closed.

omveer_sharma · January 14, 2022, 8:27pm

yes, we can do this. i have tried this and successfully implemented the codeBut I i face one problem, i can only run code on the CPU, while using the cuda compiler shows error regarding batch first command.