Custom layer-specific gradient calculation

Hi,

I would like to change the gradient calculation for one of the layers, depending on where the gradient is coming from. Is there an easy way to do so?

Here is a concrete example of what I want to achieve:

I have a layer (A) connected to two other layers (B, C). The gradient of the initial layer (A) will be equal to the sum of the gradients coming from the two other layers (B, C), however I would like the gradient coming from one of the other layers (B) to have less of an impact on the first layer. To do so I want to multiply it by 0.9 before the addition.

So the code would look something like this:

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(1, 2)
        self.fc2 = nn.Linear(2, 1)
        self.fc3 = nn.Linear(2, 1)
        
    def forward(self, x):
        x = self.fc1(x)
        x = F.relu(x)
        y = self.fc2(x)
        y = F.relu(y)
        z = self.fc3(x)
        z = F.relu(z)
        x = y+z
        return x

I know that I could use a hook, but this seems to only change the whole gradient after the summation, so I don’t think that would work. I also obviously want the gradient of other layers to remain unaffected when taking training step, so changing the gradient of one of the other layers completely is not a good solution.

You right in that hooks registered via t.register_hook(…) will apply changes after all the gradients from B and C are both accumulated.

What you can do instead is register a different type of hook that interposes at a different point. Specifically here, you’d want to use a post-hook to modify the gradient after backward has been computed for B via t.grad_fn.register_hook(…).

For more details you can see:
https://pytorch.org/docs/stable/notes/autograd.html#backward-hooks-execution

1 Like