Hello everyone,

I’ve encountered an interesting issue while working with the autograd system in PyTorch, which could potentially lead to unexpected or even unnoticed problems. Here’s a minimal example to illustrate the issue:

Let’s say we have `z = x + y`

and `f = z + x + y`

.

Then `df/dz`

should be 2, as `f = z + z = 2z`

.

However, when I calculate this in PyTorch using the following code:

```
import torch
x = torch.tensor([1.], requires_grad=True)
y = torch.tensor([2.], requires_grad=True)
z = x + y
f = z + x + y
df_dz = torch.autograd.grad(f, z, retain_graph=True)
print(df_dz)
```

The output I get is 1, not 2. Can anyone explain why this is the case?

If I allow substitutions, it becomes arbitrary as to what a derivative wrt a intermediate like z, e.g. df_dz could be.

Suppose instead I have z:= 2x + y, then there’s two ways of writing f:

- 2z - x
- z + x + y

Would df_dz be 1 or 2 here? It isn’t well-defined.

A different way to think about this that is less confusing might be:

We have two functions, z, f defined as follows:

z(a, b) := a + b

f(w, x, y) := w + x + y

In each of the expressions above, the variables w, x, y, a, b are only meaningful within the scope of the function definition.

In the original example, I can compose the two functions and define g(a, b, x, y) := f(z(a, b), x, y)

and then h(x, y) := g(x, y, x, y).

From the perspective of f, it does not know could be related and hence the answer is always 1 without any ambiguity.

