Accessing input/output of unnamed functional layers via hooks

Hey everyone,
I am currently trying to use the Intel FP8 Toolkit and trying to expand it with additional formats. From what I gathered, it builds on the eager PyTorch Quantization method, which uses hooks to modify intermediate layer inputs and outputs.I am interested in static PTQ.
I’ll give a little bit of background, hoping that this will help describing my question precisely:

I really like this approach, as it is easy to comprehend. However, when applying it to transformer models, such as Albert I am facing some issues.

Albert is composed of Attention heads, which can be (simplified) described as:

When using the model.named_modules() function on this model, like suggested here this returns (just an excerpt):

Code to reprodude
from transformers import AlbertForQuestionAnswering

# get model
model = AlbertForQuestionAnswering.from_pretrained("twmkn9/albert-base-v2-squad2")

# list layer names and layer types
for name, module in model.named_modules():
    print(f"{name} | {type(module)}")

Now one could use the names of the linear modules and easily start modifiying the inputs and outputs of for example Q,K and V. But what if I wanted to modify intermediate layers, such as the output of the softmax, to quantize the input of the MatMul?

In the file defining Albert we can see the softmax layer, defined as a nn.functional.softmax, which results in it not being created as named_module(). At least that is my understanding.

So, in essence, my question is:
Is there a way to access the input/output of intermediate (unnamed) layers like these? If yes, how should this be done?

Thanks a lot for looking into this!

You either have to edit the model to modularize those operations, or you can’t use eager mode for that.

Thanks! Would there be a smart way to it? I would for example change nn.functional.softmax to nn.softmax and then take it from there, I guess that’s the only way?

I guess I am stuck with eager mode, since I want to try dtypes that are not supported by PyTorch yet, so both prototype export and graph methods are out.

Update for other readers that might run into this problem:
I’ve updated the structure of for example the Albert Attention Class to include the softmax as a module. I added a softmax layer to the __init__():

self.softmax1 = nn.Softmax(dim=-1)

Then I replaced the actual execution in the forward() function with:

# Original implementation (line 342)
attention_probs = nn.functional.softmax(attention_scores, dim=-1)

# Replaced by:
attention_probs = self.softmax1(attention_scores)

Although a little finicky, it works like a charm and lists the softmax gets listed with model.named_modules().