# How to calculate / get intermediate results of backpropagation

I’ve found this one example which illustrates my problem:

``````import torch
import torch.nn as nn

# Create some dummy data.
gt = torch.ones_like(x) * 16 - 0.5  # "ground-truths"

# We will use MSELoss as an example.
loss_fn = nn.MSELoss()

# Do some computations.
v = x + 2
y = v ** 2

# Compute loss.
loss = loss_fn(y, gt)

print(f'Loss: {loss}')

print(f'dloss/dx:\n {d_loss_dx}')

``````

With this approach, I can calculate dloss / dx using the grad function, but when the function is called a second time to calculate dloss / dv (which we just assume, I need the result for) I get the Runtime Errror:

``````
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
``````

Probably due to the fact that backpropagation was already called once, and we would not want to do that again, I guess?

So my question is, is there a better way to do what I try to achieve above, calculating dloss_dx and d_loss_dv?

just pass a list of tensors as “inputs”

2 Likes

@codeflux
Another alternative would be to use

``````retain_graph=True
``````
``````# Now compute gradients:

print(f'dloss/dx:\n {d_loss_dx}')
print(f'dloss/dv:\n {d_loss_dv}')
print(f'dloss/dy:\n {d_loss_dy}')
``````
2 Likes

Ok thanks, that seems to work!
But I cannot calculate intermediate variables at the same time.
Let’s say dy/dv in this example.

Ok, perfect, this basically does what I want! I wonder how many times backpropagation is used here, though?

@codeflux backward propagation ( gradient update ) is done everytime you pass a new variable. retain_graph just ensures that the computation graph is not cleared and retains its values. This helps us in performing backward propagation once more

It does give me a grad, but in respect to what?

I don’t think it has to backpropagate again though, since the dy / dx for example is already calculated in the first backpropagation call (but not saved).

@codeflux Agree. Unless you re-calculate the loss , the grads will always be same.

Then I am a bit confused as to what grad actually does.
My problem is this: I want to update the weights of the network manually, using a part of the loss function derived in respect to the weights itself.

Now I am confused what it uses as a loss function to do backpropagation, when using grad. Probably just the output function?

@codeflux

• In your case there are three variables x, y and v

• x is the foremost input and y is the final output. v is an intermediate calc

• In an ideal scenario, if you have a single value for x ( e.g. x=5 ) ,the values for v and y are fixed ( immutable ) and therefore the grads are fixed for a single loss backward propagation

• grad is designed to start from the calculated loss(which is a scalar value) and then propagate backwards for each variable, which you have already seen. If the loss is the same, the grads will also be the same

• If you want to alter the grads, you need to register hooks so that instead of the normal grads propagating backwards, you have your own modified grads propagating

https://pytorch.org/docs/stable/generated/torch.Tensor.register_hook.html

WITHOUT HOOK

``````x = torch.ones(2, 2, requires_grad=True)
gt = torch.ones_like(x) * 16 - 0.5  # "ground-truths"
# We will use MSELoss as an example.
loss_fn = nn.MSELoss()
# Do some computations.
v = x + 2
y = v ** 2

# Compute loss.
loss = loss_fn(y, gt)

print(f'dloss/dx:\n {d_loss_dx}')
print(f'dloss/dv:\n {d_loss_dv}')
print(f'dloss/dy:\n {d_loss_dy}')
``````
``````dloss/dx:
(tensor([[-19.5000, -19.5000],
[-19.5000, -19.5000]]),)
dloss/dv:
(tensor([[-19.5000, -19.5000],
[-19.5000, -19.5000]]),)
dloss/dy:
(tensor([[-3.2500, -3.2500],
[-3.2500, -3.2500]]),)
``````

WITH HOOK

``````x = torch.ones(2, 2, requires_grad=True)
gt = torch.ones_like(x) * 16 - 0.5  # "ground-truths"
# We will use MSELoss as an example.
loss_fn = nn.MSELoss()
# Do some computations.
v = x + 2
y = v ** 2

# Compute loss.
loss = loss_fn(y, gt)

print(f'dloss/dx:\n {d_loss_dx}')
print(f'dloss/dv:\n {d_loss_dv}')
print(f'dloss/dy:\n {d_loss_dy}')

h.remove()
``````
``````dloss/dx:
(tensor([[-39., -39.],
[-39., -39.]]),)
dloss/dv:
(tensor([[-19.5000, -19.5000],
[-19.5000, -19.5000]]),)
dloss/dy:
(tensor([[-3.2500, -3.2500],
[-3.2500, -3.2500]]),)
``````

There are different variations of applying hook, and you can choose based on what suits you

1 Like

End node (dloss/dv). In other words, it is a vector-Jacobian product 1 @ J_loss_y @ J_y_v.

These Jacobians may be diagonal, but they’re not materialized / exported anyway. So, dy/dv=dPow(v,2)/dv=2v is not normally accessible (outside of PowBackward)

Last version has “experimental forward mode AD” that may facilitate this, but I haven’t looked into it yet.