How to calculate the gradient of the previous layer when the gradient of the latter layer is given?

mankasto · May 26, 2022, 4:26pm

Hello pytorch community,
Can someone help me solve this problem? if the gradients of a certain layer is known, how can I use the API in torch to calculate the gradient of the previous layer?
Thanks in advance.

AlphaBetaGamma96 · May 26, 2022, 4:27pm

Can you share a minimal reproducible example of this problem?

mankasto · May 26, 2022, 4:56pm

import torch  
import torch.nn as nn  
  
  
class SimNet(nn.Module):  
    def __init__(self, channel=1, hidden=100, num_classes=10):  
        super(SimNet, self).__init__()  
        act = nn.Sigmoid  
        # input shape: (batch_size * 14 * 14)  
        self.body = nn.Sequential(  
            nn.Conv2d(channel, 4, kernel_size=2, padding=1, stride=2, bias=False),  
            act(),  
            nn.Conv2d(4, 4, kernel_size=2, padding=1, stride=2, bias=False),  
            act(),  
        )  
        self.fc = nn.Sequential(  
            nn.Linear(hidden, num_classes, bias=False)  
        )  
  
    def forward(self, x):  
        out = self.body(x)  
        out = out.view(out.size(0), -1)  
        out = self.fc(out)  
        return out  
  
  
def weights_init(m):  
    try:  
        if hasattr(m, "weight"):  
            m.weight.data.uniform_(-0.5, 0.5)  
    except Exception:  
        print('warning: failed in weights_init for %s.weight' % m._get_name())  
    try:  
        if hasattr(m, "bias"):  
            m.bias.data.uniform_(-0.5, 0.5)  
    except Exception:  
        print('warning: failed in weights_init for %s.bias' % m._get_name())  
  
  
# model  
net = SimNet()  
net.apply(weights_init)  
loss_fn = nn.CrossEntropyLoss()  
# batch_size = 4  
x = torch.randn((4, 1, 14, 14))  
y = torch.tensor([0, 1, 2, 4])  
# forward  
out = net(x)  
loss = loss_fn(out, y)  
grads = torch.autograd.grad(loss, net.parameters())  
# gradients of three params: two from conv2d, one from fc  
grads = list((_.detach().clone() for _ in grads))  
# if modify fc gradient, then how to calculate new gradient of the last conv2d?  
new_grad_fc = torch.ones_like(grads[-1])  
new_grad_last_conv = new_grad_fc * ?

mankasto · May 26, 2022, 4:58pm

Here is an example. Thanks for your reply so quickly.

AlphaBetaGamma96 · May 26, 2022, 5:24pm

So you want the gradient of self.body() w.r.t the parameters?

You could just return both self.body(x) and self.fc(out) as outputs of the model and differentiate them in the same way you’ve done for self.fc(out)

mankasto · May 27, 2022, 12:50am

Thanks a lot. Actually, what i want is not the gradient of self.body() w.r.t. the parameters but still that of loss w.r.t. the parameters.
According to the chain rule, the gradient of the loss w.r.t. the parameters of conv2 (the last conv2d layer) can be calculated based on that of the loss w.r.t. the paramneters of fc layer. As a result, if i change the gradient of the loss w.r.t. the parameters of fc layer, then the gradient of the loss w.r.t. the parameters of conv2 should also change and maintain the original relationship between the two.
Therefore, in order to calculate this, i need to first obtain multiple gradient values in the calculation diagram. I want to know if there is any simple method or ready-made api for it.

AlphaBetaGamma96 · May 27, 2022, 12:02pm

That is true you can use the chain rule but remember you using the chain rule in the context of Tensors rather than just scalars so it’s not simple as just multiplying by a scalar but rather a matrix product.

Won’t this give you the gradient of the loss w.r.t the parameters of your network?

But you want to change the gradient of the loss w.r.t params (of fc) and determine how that changes the gradients of the loss w.r.t conv2d?

One thing you could have a look into is per-sample gradients via hooks, because you’ll need to define a formula which takes grad_output and multiplies it with a manual expression such that it defines the new gradient. It won’t be as simple as element-wise multiplication as you also have a batch dimension too which autograd explicitly sums over when defining the gradient.

An example of this being explained in far better detail can be found here in which backprop is explained well. It’ll give you a clear example of how you can change a gradient then define the gradients of any upstream layers.

mankasto · May 27, 2022, 12:34pm

Hi, AlphaBetaGamma96. Thanks very much! I’ll study your example carefully.

DuaneNielsen · June 1, 2022, 6:20pm

If you already have gradients for a layer, you can pass them into .backward() as a parameter…

Example here

.backward in the docs

Note you can actually pass the inputs into backward too. I’ve not done this before, but I assume it would compute the the forward pass, then compute the backwards pass w.r.t. your calculated gradients.

something like… (below code is just an illustration)

net = nn.Sequential([layer1, layer2])  # <= pseudocode
gradient_tensor=torch.zeros_like(layer2.grad)
net.backward(inputs=[input_tensor], gradient=gradient_tensor)
print(net.layer1.grad)