Pytorch register full backward pre hook can not change the grad of the module

Qizhou · October 7, 2023, 8:49am

The simplest code is follows:

import torch
from torch import nn

def hook(module, grad_output):
    grad_output = [grad_output[0] + 9999999]  
    return grad_output

l1 = nn.Linear(2,3)
l1.register_full_backward_pre_hook(hook)

x = torch.rand(5,2)
# x.requires_grad_(True)
y = l1(x)
loss = y.sum()
loss.backward()

print(l1.weight.grad)

pytorch version is 2.0.1+cu118. Is there a bug? If I dont annotate the x.requires_grad_(True), it would be right.

ptrblck · October 7, 2023, 1:32pm

From the docs:

The hook should not modify its arguments, but it can optionally return a new gradient with respect to the output that will be used in place of grad_output in subsequent computations.

Qizhou · October 8, 2023, 8:00am

Thank you for your reply!

I think the output of l1.weight.grad in this code should be a large tensor since I have added a big number into the grad of l1 layer’s output at the backward_pre stage.

However, when I run the code:

import torch
from torch import nn

def hook(module, grad_output):
    a = [grad_output[0] + 9999999]  
    return a

l1 = nn.Linear(2,3)
l1.register_full_backward_pre_hook(hook)

x = torch.ones(5,2)
# x.requires_grad_(True)
y = l1(x)
loss = y.sum()
loss.backward()

print(l1.weight.grad)

It output:

When I run:

import torch
from torch import nn

def hook(module, grad_output):
    a = [grad_output[0] + 9999999]  
    return a

l1 = nn.Linear(2,3)
l1.register_full_backward_pre_hook(hook)

x = torch.ones(5,2)
x.requires_grad_(True)
y = l1(x)
loss = y.sum()
loss.backward()

print(l1.weight.grad)

It output:

The only difference between the two codes is in x.requires_grad_(True). I think that the results of two runs should both be tensors of 50000000, or, at least the results of both should be the same.