Obtain the attention weights within Transformer class

Hey,

i have initialized a transformer-encoder block using:

“encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout)”
“transformer = nn.TransformerEncoder(encoder_layer, num_layers)”,

how can I access the attention weights that are returned by “F.multi_head_attention_forward”?
since “need_weights” is manually set to false within the encoder_layer “_sa_block” function I don’t know how to access them.

best,
Paul

import torch.nn as nn
encoder_layer = nn.TransformerEncoderLayer(d_model=2, nhead=1,batch_first=True,dropout=0,dim_feedforward=2)
attention_weights = encoder_layer.state_dict()['self_attn.in_proj_weight']
1 Like

that is only the matrix that contains the projections matrices to produce q,k,v together with the input. The attention matrix I mean it the result of softmax(q @ k.T). I found a different way by inheriting from TransformerEncoderLayer and setting need_weights to true in _sa_block. People from Pytorch told be they may build in the support in a later updates.

2 Likes

Crap I remember it was True before… I spent so long time debugging and this is why??

Well, somehow pretty late, but i met the similar situation, but my lazy intuition tell me to recalculate the atten using encoder_coder.encoder_layer.self_attn( input)[1], I will only consider the first layer’s attn, since the attn after first layer not sure still entagled with the position