In the documentation it says that the inputs to Multiheadattention
are key, query and value
.
I have the following 2 doubts :
- How do I specify the inputs (word embeddings) on which the multiheadattention has to be performed ?
- What should be the key, query and value inputs to the function? Aren’t they the weights that the network will learn ?
@ADONAI_TZEVAOT
So for self attention, the key , query and value are all the same. So if your input is for example ‘how are you’ and you have embedding dim of 300. The Key , Query and Value will all be [3,300]
, since you have 3 words.
For the second doubt. Key, Query and Value are not the weights, they are your inputs. Pytorch creates random weights behind the scenes. You can watch this video for a detailed explanation Self Attention with torch.nn.MultiheadAttention Module - YouTube