How to calculate / get intermediate results of backpropagation

I’ve found this one example which illustrates my problem:

import torch
from torch.autograd import grad
import torch.nn as nn


# Create some dummy data.
x = torch.ones(2, 2, requires_grad=True)
gt = torch.ones_like(x) * 16 - 0.5  # "ground-truths"

# We will use MSELoss as an example.
loss_fn = nn.MSELoss()

# Do some computations.
v = x + 2
y = v ** 2

# Compute loss.
loss = loss_fn(y, gt)

print(f'Loss: {loss}')

# Now compute gradients:
d_loss_dx = grad(outputs=loss, inputs=x)

d_loss_dv = grad(outputs=loss, inputs=v)

print(f'dloss/dx:\n {d_loss_dx}')

With this approach, I can calculate dloss / dx using the grad function, but when the function is called a second time to calculate dloss / dv (which we just assume, I need the result for) I get the Runtime Errror:


RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

Probably due to the fact that backpropagation was already called once, and we would not want to do that again, I guess?

So my question is, is there a better way to do what I try to achieve above, calculating dloss_dx and d_loss_dv?

Thanks in advance!

just pass a list of tensors as “inputs”

1 Like

@codeflux
Another alternative would be to use

retain_graph=True
# Now compute gradients:
d_loss_dx = grad(outputs=loss, inputs=x, retain_graph=True)
d_loss_dv = grad(outputs=loss, inputs=v, retain_graph=True)
d_loss_dy = grad(outputs=loss, inputs=y)

print(f'dloss/dx:\n {d_loss_dx}')
print(f'dloss/dv:\n {d_loss_dv}')
print(f'dloss/dy:\n {d_loss_dy}')
1 Like

Ok thanks, that seems to work!
But I cannot calculate intermediate variables at the same time.
Let’s say dy/dv in this example.

Ok, perfect, this basically does what I want! I wonder how many times backpropagation is used here, though?

@codeflux backward propagation ( gradient update ) is done everytime you pass a new variable. retain_graph just ensures that the computation graph is not cleared and retains its values. This helps us in performing backward propagation once more

Try v.retain_grad() (this should populate v.grad)

It does give me a grad, but in respect to what?

I don’t think it has to backpropagate again though, since the dy / dx for example is already calculated in the first backpropagation call (but not saved).

@codeflux Agree. Unless you re-calculate the loss , the grads will always be same.

Then I am a bit confused as to what grad actually does.
My problem is this: I want to update the weights of the network manually, using a part of the loss function derived in respect to the weights itself.

Now I am confused what it uses as a loss function to do backpropagation, when using grad. Probably just the output function?

@codeflux

  • In your case there are three variables x, y and v

  • x is the foremost input and y is the final output. v is an intermediate calc

  • In an ideal scenario, if you have a single value for x ( e.g. x=5 ) ,the values for v and y are fixed ( immutable ) and therefore the grads are fixed for a single loss backward propagation

  • grad is designed to start from the calculated loss(which is a scalar value) and then propagate backwards for each variable, which you have already seen. If the loss is the same, the grads will also be the same

  • If you want to alter the grads, you need to register hooks so that instead of the normal grads propagating backwards, you have your own modified grads propagating

https://pytorch.org/docs/stable/generated/torch.Tensor.register_hook.html

WITHOUT HOOK

x = torch.ones(2, 2, requires_grad=True)
gt = torch.ones_like(x) * 16 - 0.5  # "ground-truths"
# We will use MSELoss as an example.
loss_fn = nn.MSELoss()
# Do some computations.
v = x + 2
y = v ** 2

# Compute loss.
loss = loss_fn(y, gt)
d_loss_dx = grad(outputs=loss, inputs=x, retain_graph=True)
d_loss_dv = grad(outputs=loss, inputs=v, retain_graph=True)
d_loss_dy = grad(outputs=loss, inputs=y)

print(f'dloss/dx:\n {d_loss_dx}')
print(f'dloss/dv:\n {d_loss_dv}')
print(f'dloss/dy:\n {d_loss_dy}')
dloss/dx:
 (tensor([[-19.5000, -19.5000],
        [-19.5000, -19.5000]]),)
dloss/dv:
 (tensor([[-19.5000, -19.5000],
        [-19.5000, -19.5000]]),)
dloss/dy:
 (tensor([[-3.2500, -3.2500],
        [-3.2500, -3.2500]]),)

WITH HOOK

x = torch.ones(2, 2, requires_grad=True)
gt = torch.ones_like(x) * 16 - 0.5  # "ground-truths"
# We will use MSELoss as an example.
loss_fn = nn.MSELoss()
# Do some computations.
v = x + 2
y = v ** 2

h = v.register_hook(lambda grad: grad * 2)

# Compute loss.
loss = loss_fn(y, gt)
d_loss_dx = grad(outputs=loss, inputs=x, retain_graph=True)
d_loss_dv = grad(outputs=loss, inputs=v, retain_graph=True)
d_loss_dy = grad(outputs=loss, inputs=y)

print(f'dloss/dx:\n {d_loss_dx}')
print(f'dloss/dv:\n {d_loss_dv}')
print(f'dloss/dy:\n {d_loss_dy}')

h.remove()
dloss/dx:
 (tensor([[-39., -39.],
        [-39., -39.]]),)
dloss/dv:
 (tensor([[-19.5000, -19.5000],
        [-19.5000, -19.5000]]),)
dloss/dy:
 (tensor([[-3.2500, -3.2500],
        [-3.2500, -3.2500]]),)

There are different variations of applying hook, and you can choose based on what suits you

End node (dloss/dv). In other words, it is a vector-Jacobian product 1 @ J_loss_y @ J_y_v.

These Jacobians may be diagonal, but they’re not materialized / exported anyway. So, dy/dv=dPow(v,2)/dv=2v is not normally accessible (outside of PowBackward)

Last version has “experimental forward mode AD” that may facilitate this, but I haven’t looked into it yet.