MultiHeadAttention Weights Interpretation

MultiHeadAttention attn_output_weights returns (L,S) shape sequence. Where L is the target and S is the source.

What do target and source weights actually mean? I want to get the transformer’s weightage of the input values. How would I do this?

Assuming that you have average_attn_weights=True, the attn_output_weights are the transformer’s weightage of the input values (attention matrix used to scale the input values) averaged across different heads as far as I know.

According to Pytorch docs, the L is anything you want to tell the network to pay attention to, while the S is what you use as an input.

Sorry I’m still a bit confused. In self-attention, what would the matrix look like?

Usually the self-attention matrix looks like a square matrix. The size of it depends on the input since it scales the input according to attention scores.

1 Like