# How to print gradient graph

``````def maml_simulation() -> None:
print(f"before training: x inner: {x_inner} theta outer: {theta_outer}")

loss_func = lambda x: (x - 10) ** 2

for i in range(5):

print(f"outer loop, theta before: {theta_outer}")
for j in range(5):

prediction = theta_inner * x_inner
inner_loss = loss_func(prediction)

theta_inner = theta_inner - 0.01 * grad

prediction = theta_outer * x_outer
loss = loss_func(prediction)

theta_outer = theta_outer - 0.01 * grad
print(f"theta after: {theta_outer}\n\n")
``````

I am going through some meta learning stuff and I want to try and follow the second derivatives of this loop to see what it looks likes and if it is doing what I think it is doing.

1. Does this code calulate a second derivative as is stated in this paper? https://arxiv.org/abs/1703.03400

2. How can I print out the graph or verify somehow that it is doing what I think it is doing?

Thanks

I simplified the above code into something more concise that shows what I am trying to do and also shows that it is not happening in pytorch.

By my hand calculation, the second derivative of this at the bottom print statement should be `-12.xx` but I am getting the first order derivative instead of the second even though I have set `create_graph=True`. Am I doing something wrong here?

``````def maml_simulation() -> None:
theta_two = theta.clone()
loss_func = lambda x: (x - 10) ** 2
print(f"before training: theta: {theta}")

prediction = theta_two * x_i
loss_one = loss_func(prediction)

print(
)

prediction = theta_two * x_j
loss = loss_func(prediction)
``````

Hi,

You can use the torchviz package to print the graph corresponding to gradient computations.

Thanks I’ll look into that package to see if it helps. Do you see any problem with my second derivative above? Flipping the Boolean create_graph doesn’t change the second gradient at all like I would expect

I’m not sure what is the purpose of `theta_two` in your code above, why not use `theta` directly (turns out after more investigation below, this was the root of the problem, see the rest of the answer)?

Also if I read correctly, `loss_one = (theta * xi - 10)**2`.
So `grad = 2 * xi * (theta * xi - 10)`.
So the new `theta_two = theta - 0.01 * (2 * xi * (theta * xi - 10)) = theta - 0.02 * theta * xi**2 + 0.2 * xi = theta * (1 - 0.02 * xi**2) + 0.2 * xi`.
And `loss = ((theta * (1 - 0.02 * xi**2) + 0.2 * xi) * xj - 10)**2 = 1.0816`
And its derivative `grad = 2 * xj * (1 - 0.02 * xi**2) * ((theta * (1 - 0.02 * xi**2) + 0.2 * xi) * xj - 10)`
So the final grad should be `-6.822399999999995`.

Why do you only see the part that correspond to the last loss computation and your code behaves the same for `create_graph=True` or `create_graph=False`?
Because here: `grad = torch.autograd.grad(loss, theta_two)[0]` you ask for gradients wrt `theta_two`. But theta_two is the results of `theta_two -= 0.01 * grad`, so you get gradients wrt to the result of this operation.
If you want gradients wrt to `theta`, you should use ` grad = torch.autograd.grad(loss, theta)[0]`. Then you will see that the original value of `theta_two` is needed for the double backward. And you will need to change to `theta_two = theta_two - 0.01 * grad`.

Hope this helps.

I see now. I wasn’t able to get the graph working, but I got the code snippet working. Below is the final one with the right derivatives for anyone who finds this later. One thing I don’t get though is why
`theta -= 0.01 * grad` behaves different than `theta = theta - 0.01 * grad` I though the first one was just a shorthand version and exactly the same. Why did that need to change?

``````def maml_simulation() -> None:
loss_func = lambda x: (x - 10) ** 2
print(f"before training: theta: {theta}")

prediction = theta * x_i
loss_one = loss_func(prediction)

theta_two = theta - 0.01 * grad
print(
)

prediction = theta_two * x_j
loss = loss_func(prediction)
• The first one modify the Tensor pointed to by `theta` inplace. So this is now the new value of this Tensor.
• `theta = theta - 0.01 * grad` creates a new Tensor a associate it with the name “theta”. The Tensor that was originally pointed at by theta is unchanged.
• You final code works because you do `theta_two = theta - 0.01 * grad` and so you keep a reference to the old `theta` to be able to set it as input for the `autograd.grad` call.