Obtain the attention weights within Transformer class


i have initialized a transformer-encoder block using:

“encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout)”
“transformer = nn.TransformerEncoder(encoder_layer, num_layers)”,

how can I access the attention weights that are returned by “F.multi_head_attention_forward”?
since “need_weights” is manually set to false within the encoder_layer “_sa_block” function I don’t know how to access them.


import torch.nn as nn
encoder_layer = nn.TransformerEncoderLayer(d_model=2, nhead=1,batch_first=True,dropout=0,dim_feedforward=2)
attention_weights = encoder_layer.state_dict()['self_attn.in_proj_weight']
1 Like

that is only the matrix that contains the projections matrices to produce q,k,v together with the input. The attention matrix I mean it the result of softmax(q @ k.T). I found a different way by inheriting from TransformerEncoderLayer and setting need_weights to true in _sa_block. People from Pytorch told be they may build in the support in a later updates.