Issue with Registering Forward Hooks in a Transformer

Your code is not executable as e.g. model_layers is undefined and it’s also not properly formatted.
However, it seems you want to access the NonDynamicallyQuantizableLinear layer, which is only used for error handling in the MultiHeadAttention layer.
This code works for me:

activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output[0].detach()
    return hook


model = models.vit_b_32()
model.encoder.layers.encoder_layer_11.self_attention.register_forward_hook(get_activation("mha"))

x = torch.randn(1, 3, 224, 224)
out = model(x)
print(out.shape)
# torch.Size([1, 1000])
print(activation["mha"].shape)
# torch.Size([1, 50, 768])
1 Like