# Autograd for different loss functions in different layers

Hey all,

I am having a problem with the autograd for different losses in diffrent layers, I want to implement a new algorithm, but first I did a check test to see if results are the same.

``````import torch
import torch.nn as nn

device = 'cpu'

#define neural network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.layer1 = nn.Sequential(
nn.Linear(1, 1, bias=False))
self.layer2 = nn.Sequential(
nn.Linear(1, 1, bias=False))

def forward(self, x):
out1 = self.layer1(x)
out2 = self.layer2(out1)
return out1,out2

model = Net().to(device)

#desired output
t = (torch.randn(1, 1)).to(device)

#input
inp = (torch.randn(1, 1)).to(device)

#run network
outputs = model(inp)

#loss from the ouput layer
loss = 0.5*torch.sum( (t - outputs[1])**2.0)

#calculate the gradients just to check
loss.backward(retain_graph=True)

#part of the chain rule to calculate the gradient of the first weight

#so it will be same as backpropagation
t1 += outputs[0]

loss2 = 0.5*torch.sum( (t1 - outputs[0])**2.0)

``````

loss2 is the loss function for the hidden layer. Mathematically this should be the same as the usual backpropagation, the results should be equal, but they aren’t. I checked t1 and it is all right, but something is going on with grad2, can someone help me with this?

Hi,

Could you explain why grad2 should be the same as plain gradient? it uses gradient penalty computed in a differentiable manner. meaning that you introduce second order gradients here. Maybe you did not want the `create_graph=True`?

Sure,

In the usual way, the gradient of the first weight would be:
grad1 = -(t - outputs[1]) * outputs[1]
grad2 = -(t-outputs[1])* model.layer2[0].weight * outputs[0]

For the new loss function in the hidden layer, I have:
L = 0.5*torch.sum( (t1 - outputs[0])*2.0)
and t1 is:
t1 = outputs[0] + (t-outputs[1])
model.layer2[0].weight
Which the algorithm is right

Now, I want the gradient for the first weight is the derivative of L with respect to the weight, which is:
grad = -(t1 - outputs[0]) * outputs[0]
grad = -(t-outputs[1]) * model.layer2[0].weight * outputs[0]

Which is same as the usual way, if I disable create_graph=True, then I get zero gradient, maybe I am understanding wrong this create_graph?

By `grad1` you mean `model.layer2[0].weight.grad` wrt to the first loss you computed ?
If so,the loss is actually `loss = 0.5*torch.sum( (t - outputs[0] * model.layer2[0].weight)**2.0)`
So `grad1` would be `-(t - outputs[1]) * outputs[0]` and not `-(t - outputs[1]) * outputs[1]`.

And if `grad2` is for `model.layer1[0].weight.grad` wrt to the first loss computed, this loss is `loss = 0.5*torch.sum( (t - x * model.layer1[0].weight * model.layer2[0].weight)**2.0)` and so you get `grad2 = -(t-outputs[1])* model.layer2[0].weight * x` right?

For grad2, that indeed means `model.layer1[0].weight.grad` I thought I could write directly the output values, without the weights, in the loss function, since I would like to use t1 instead of t, but this is not possible, is it?

I don’t think it is.
Also why do you compute `t1` with `-loss`?

Ah too bad, I will have to find I workaround then, because I really need the derivative of the loss with t1 present there.

The minus sign is because I wrote the loss as (t - output) instead of (output - t), the update of the weights will be w += lr * grad instead of w += -lr * grad later on.

But thank you anyway for the help.

The minus sign is because I wrote the loss as (t - output) instead of (output - t)

But this is squared. So both are exactly the same. You don’t need a `-` sign here right?

Sorry, I meant the derivative of the loss.