Unexpected (wrong?) result when calculating derivatives using autograd

Hello everyone,

I’ve encountered an interesting issue while working with the autograd system in PyTorch, which could potentially lead to unexpected or even unnoticed problems. Here’s a minimal example to illustrate the issue:

Let’s say we have z = x + y and f = z + x + y.

Then df/dz should be 2, as f = z + z = 2z.

However, when I calculate this in PyTorch using the following code:

import torch

x = torch.tensor([1.], requires_grad=True)
y = torch.tensor([2.], requires_grad=True)

z = x + y
f = z + x + y

df_dz = torch.autograd.grad(f, z, retain_graph=True)

print(df_dz)

The output I get is 1, not 2. Can anyone explain why this is the case?

If I allow substitutions, it becomes arbitrary as to what a derivative wrt a intermediate like z, e.g. df_dz could be.

Suppose instead I have z:= 2x + y, then there’s two ways of writing f:

  1. 2z - x
  2. z + x + y
    Would df_dz be 1 or 2 here? It isn’t well-defined.

A different way to think about this that is less confusing might be:

We have two functions, z, f defined as follows:
z(a, b) := a + b
f(w, x, y) := w + x + y

In each of the expressions above, the variables w, x, y, a, b are only meaningful within the scope of the function definition.

In the original example, I can compose the two functions and define g(a, b, x, y) := f(z(a, b), x, y)
and then h(x, y) := g(x, y, x, y).

From the perspective of f, it does not know could be related and hence the answer is always 1 without any ambiguity.

1 Like