Forward Hooks Not Working When `model.eval()` and `torch.no_grad()`

For TransformerEncoder network, when I created a forward hook, it does not get called if there are model.eval() and with torch.no_grad():

Here is a code to reproduce:

import torch
model = torch.nn.TransformerEncoder(encoder_layer=torch.nn.TransformerEncoderLayer(100, 4, 200, batch_first=True), num_layers=3).to('cuda')
input = torch.randn(2, 10, 100).to('cuda')

def createHook(name):
    print(f"Hook for {name} is set")
    
    def hook(model, input, output):
        print(f"Hook working")

    return hook

for i in range(len(model.layers)):
    model.layers[i].self_attn.register_forward_hook(createHook(f"t_layer_{i}"))


model.eval()
with torch.no_grad():
    pred = model(input) 
    pred = pred.cpu().detach().numpy()

It gives “Hook for * is set” as output, but it does not print “Hook working”, (and it does not actually call the hook).

I observed that when there is only model.eval(), it works. Also, when there is only with torch.no_grad(), it also works. Somehow using both of them make it not working. And also, I only observed this with TransformerEncoder

Is this some kind of a bug? Or, am I missing something?

I guess the “faster transformer” pass is used as described here which does not seem to support forward hooks. Could you create a GitHub issue for it, if none is created yet?

Thanks for the answer. It is interesting. I could not find any related issue, and created one

As one user suggested in the github issue, this problem can be overcomed by simply using torch.backends.mha.set_fastpath_enabled(False). Apparently fastpath disables the hooks