how can I access the attention weights that are returned by “F.multi_head_attention_forward”?
since “need_weights” is manually set to false within the encoder_layer “_sa_block” function I don’t know how to access them.
that is only the matrix that contains the projections matrices to produce q,k,v together with the input. The attention matrix I mean it the result of softmax(q @ k.T). I found a different way by inheriting from TransformerEncoderLayer and setting need_weights to true in _sa_block. People from Pytorch told be they may build in the support in a later updates.