Hello pytorch community,
Can someone help me solve this problem? if the gradients of a certain layer is known, how can I use the API in torch to calculate the gradient of the previous layer?
Thanks in advance.
Hello pytorch community,
Can you share a minimal reproducible example of this problem?
import torch import torch.nn as nn class SimNet(nn.Module): def __init__(self, channel=1, hidden=100, num_classes=10): super(SimNet, self).__init__() act = nn.Sigmoid # input shape: (batch_size * 14 * 14) self.body = nn.Sequential( nn.Conv2d(channel, 4, kernel_size=2, padding=1, stride=2, bias=False), act(), nn.Conv2d(4, 4, kernel_size=2, padding=1, stride=2, bias=False), act(), ) self.fc = nn.Sequential( nn.Linear(hidden, num_classes, bias=False) ) def forward(self, x): out = self.body(x) out = out.view(out.size(0), -1) out = self.fc(out) return out def weights_init(m): try: if hasattr(m, "weight"): m.weight.data.uniform_(-0.5, 0.5) except Exception: print('warning: failed in weights_init for %s.weight' % m._get_name()) try: if hasattr(m, "bias"): m.bias.data.uniform_(-0.5, 0.5) except Exception: print('warning: failed in weights_init for %s.bias' % m._get_name()) # model net = SimNet() net.apply(weights_init) loss_fn = nn.CrossEntropyLoss() # batch_size = 4 x = torch.randn((4, 1, 14, 14)) y = torch.tensor([0, 1, 2, 4]) # forward out = net(x) loss = loss_fn(out, y) grads = torch.autograd.grad(loss, net.parameters()) # gradients of three params: two from conv2d, one from fc grads = list((_.detach().clone() for _ in grads)) # if modify fc gradient, then how to calculate new gradient of the last conv2d? new_grad_fc = torch.ones_like(grads[-1]) new_grad_last_conv = new_grad_fc * ?
Here is an example. Thanks for your reply so quickly.
So you want the gradient of self.body() w.r.t the parameters?
You could just return both
self.fc(out) as outputs of the model and differentiate them in the same way you’ve done for
Thanks a lot. Actually, what i want is not the gradient of self.body() w.r.t. the parameters but still that of loss w.r.t. the parameters.
According to the chain rule, the gradient of the loss w.r.t. the parameters of conv2 (the last conv2d layer) can be calculated based on that of the loss w.r.t. the paramneters of fc layer. As a result, if i change the gradient of the loss w.r.t. the parameters of fc layer, then the gradient of the loss w.r.t. the parameters of conv2 should also change and maintain the original relationship between the two.
Therefore, in order to calculate this, i need to first obtain multiple gradient values in the calculation diagram. I want to know if there is any simple method or ready-made api for it.
That is true you can use the chain rule but remember you using the chain rule in the context of Tensors rather than just scalars so it’s not simple as just multiplying by a scalar but rather a matrix product.
Won’t this give you the gradient of the loss w.r.t the parameters of your network?
But you want to change the gradient of the loss w.r.t params (of
fc) and determine how that changes the gradients of the loss w.r.t
One thing you could have a look into is per-sample gradients via hooks, because you’ll need to define a formula which takes
grad_output and multiplies it with a manual expression such that it defines the new gradient. It won’t be as simple as element-wise multiplication as you also have a batch dimension too which autograd explicitly sums over when defining the gradient.
An example of this being explained in far better detail can be found here in which backprop is explained well. It’ll give you a clear example of how you can change a gradient then define the gradients of any upstream layers.
Hi, AlphaBetaGamma96. Thanks very much! I’ll study your example carefully.
If you already have gradients for a layer, you can pass them into .backward() as a parameter…
Note you can actually pass the inputs into backward too. I’ve not done this before, but I assume it would compute the the forward pass, then compute the backwards pass w.r.t. your calculated gradients.
something like… (below code is just an illustration)
net = nn.Sequential([layer1, layer2]) # <= pseudocode gradient_tensor=torch.zeros_like(layer2.grad) net.backward(inputs=[input_tensor], gradient=gradient_tensor) print(net.layer1.grad)