Intuition behind adding MultiheadAttention block under activation.py

@ptrblck Why is the implementation of MultiheadAttention a part of the pytorch/torch/nn/modules/activation.py ?

I don’t know, but @zhangguanheng66 might know the reason as he has implemented it in this PR.

1 Like