how can I access the attention weights that are returned by “F.multi_head_attention_forward”?
since “need_weights” is manually set to false within the encoder_layer “_sa_block” function I don’t know how to access them.
that is only the matrix that contains the projections matrices to produce q,k,v together with the input. The attention matrix I mean it the result of softmax(q @ k.T). I found a different way by inheriting from TransformerEncoderLayer and setting need_weights to true in _sa_block. People from Pytorch told be they may build in the support in a later updates.
Well, somehow pretty late, but i met the similar situation, but my lazy intuition tell me to recalculate the atten using encoder_coder.encoder_layer.self_attn( input)[1], I will only consider the first layer’s attn, since the attn after first layer not sure still entagled with the position