MultiHeadAttention attn_output_weights returns (L,S) shape sequence. Where L is the target and S is the source.
What do target and source weights actually mean? I want to get the transformer’s weightage of the input values. How would I do this?
MultiHeadAttention attn_output_weights returns (L,S) shape sequence. Where L is the target and S is the source.
What do target and source weights actually mean? I want to get the transformer’s weightage of the input values. How would I do this?
Assuming that you have average_attn_weights=True
, the attn_output_weights are the transformer’s weightage of the input values (attention matrix used to scale the input values) averaged across different heads as far as I know.
According to Pytorch docs, the L is anything you want to tell the network to pay attention to, while the S is what you use as an input.
Sorry I’m still a bit confused. In self-attention, what would the matrix look like?
Usually the self-attention matrix looks like a square matrix. The size of it depends on the input since it scales the input according to attention scores.