Attention weights in TransformerEncoderLayer


How to access attention weights/outputs in TransformerEncoderLayer/TransformerEncoder?
[ Referring to ]



I think this is pretty important, and huggingface transformers do this well. Is there a way to access the last Attention matrix in TransformerEncoder?

I was dealing with a similar problem and couldn’t find a solution, so I created a library called NoPdb, which allows to retrieve the value of an arbitrary tensor (in fact, any variable inside any Python function).

In the Transformer tutorial, it should be enough to do something like:

import nopdb

# Evaluate the model while capturing local variables of the 1st attention layer
with nopdb.capture_calls(model.transformer_encoder.layers[0].self_attn.forward) as calls:
    evaluate(model, test_data)

# Now we have access to the attention weights and outputs

(Note that this way, we are accumulating some tensors over the whole dataset. To avoid running out of GPU memory, it would be better to capture the variables for each batch separately, only select the tensors we need and move them to the CPU. This is left as an exercise for the reader. :wink:)