@ptrblck Why is the implementation of MultiheadAttention a part of the pytorch/torch/nn/modules/activation.py
?
I don’t know, but @zhangguanheng66 might know the reason as he has implemented it in this PR.
1 Like