Getting attention scores for each head of transformer model

amjass · July 5, 2023, 1:49pm

Hi all,
I am trying to get attention weights for input sentences for a transformer model I have built with my own corpus of text. I have used the PyTorch Language Translation with nn.Transformer and torchtext — PyTorch Tutorials 2.0.1+cu117 documentation as a guide and the model trains and predicts well.

I now need to access attention weights of the encoder but am struggling and would appreciate any guidance. I need the attention scores for an input sentence for each head of the encoder, and the final attention weights of the last layer of the encoder block.
I have managed to access each layers attention score as follows:

src = text_transform['main'](src).view(-1, 1)
num_tokens = src.shape[0]
src_mask = (torch.zeros(num_tokens, num_tokens)).type(torch.bool)
input_embeddings = transformer.encode(src, src_mask)

for layer in transformer.transformer.encoder.layers:
     emb, att = layer.self_attn(input_embeddings, input_embeddings, input_embeddings, need_weights=True)
     input_embeddings = layer(input_embeddings)

but this only provides attention scores for each layer of the encoder block and presumably only for the first head as i only get 6 sets of attention (unless they are averaged across the heads which is still not what i need)

I would like the final attention score for each encoder block and for each head. Any advice would be much appreciated.

many thanks!