Register_hook does not work

I am wondering what I am doing wrong with register_hook since it does not seem to register a hook.
My goal is to modify grad matrix before weights are updated. For debugging purposes I just assigned zero matrix for a grad variable, but network trains perfectly.
So clearly hooks don’t work, but I cannot figure why. Here is my train function:

def train_cnn():

    model = Net()
    model.cuda()

    criterion = torch.nn.CrossEntropyLoss()

    model.train()
    optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum, weight_decay=weight_decay)

    for e in range(epochs):
        agg_loss = 0

        for data in trainloader:
            x, y = data
            x = x.cuda()
            y = y.cuda()

            outputs = model(x)
            loss = criterion(outputs, y)
            optimizer.zero_grad()
            loss.backward()

            with torch.no_grad():
                 hooks = []
                 for conv_layer in model.convos:
                     conv_layer.weight.retain_grad()
                     h = conv_layer.weight.register_hook(lambda grad: torch.zeros(size=grad))
                     hooks.append(h)

                 for fc_layer in model.linears:
                     fc_layer.weight.retain_grad()
                     h = fc_layer.weight.register_hook(lambda grad: torch.zeros(size=grad))
                     hooks.append(h)

            optimizer.step()
            for h in hooks:
                 h.remove()

Thank you!

Disregard this post, as it was a wrong suggestion.

1 Like

ohh, I see… I thought I could just create a function with return value something like
lambda grad: func(grad) to update weights, because I need to perform actions on the gradient that are not attribute functions. Is this possible with register hooks?
Function would be a series of different matrix manipulations.

No, sorry. I’m wrong and your code snippet should be the right way and also seems to work:

model = nn.Linear(1, 1)
model.weight.register_hook(lambda grad: torch.ones_like(grad) * 1000)
model(torch.randn(1, 1)).backward()
print(model.weight.grad)
> tensor([[1000.]])

I’ll edit my previous post.

1 Like

wait, then what is my problem? It definitely did not work for me using my training code, it just trained as usual, while it was supposed to fail the training…
Is calling optimizer.backward() then right way of doing it?

I think your code registers the hooks too late (after the backward call).
Note that the hook will be called during the gradient calculation.

You could register the hooks during setup and use the standard training loop without removing and adding the hooks again.

1 Like