Considering a multi-layer network L1 -> L2 -> L3 -> …

and using “loss.backward()” function to update the weights.

Is there a way to compute the loss between each layers as it is propagated? L1 <-(loss here?) L2

Considering a multi-layer network L1 -> L2 -> L3 -> …

and using “loss.backward()” function to update the weights.

Is there a way to compute the loss between each layers as it is propagated? L1 <-(loss here?) L2

Usually you don’t use a loss between intermediate layers.

Some model architectures like Inception use an auxiliary loss.

Would you rather like to see the gradients at a specific layer?

If so, you could just print it using:

```
loss.backward()
print(model.some_layer.weights.grad)
```

Actually I was interested in the loss itself and not the gradient.

To be specific about my use case I wanted to use two models in sequence and propagate the loss from the 2nd model to the 1st.

Any suggestions on how this can be accomplished?

If you want to use two models sequentially, you can just pass the output of one model to the other:

```
output1 = model1(x)
output2 = model2(output1)
```

As long as you don’t detach the tensors, the loss will be backpropagated to model1.

1 Like

I did try that but the weights of the 2nd model are not updated, while the weights of the first model are updated.

criterion = nn.CrossEntropyLoss()

optimizer.zero_grad()

output1 = model1(x)

output2 = model2(output1)

loss = criterion( output2 , labels )

a = copy.deepcopy(model2.linear1.weight.data)

loss.backward()

optimizer.step()

b = copy.deepcopy(model2.linear1.weight.data)

print(torch.equal(a, b))

The above code prints “True” when run.

However if you check the “model1” instead of the “model2” weights it prints “False”.

Can you please help me find out where I am going wrong. Thanks in advance.

So sorry my bad. I had defined the optimizer only on the first model. That is why it was only being updated.

Thank you for your insights.

1 Like