Hi all

I was trying to understand how does the `autograd`

module and `backward`

function work. In some projects, I’ve seen non-linear and non-convex operations like the following:

```
z1, z2 = model(x1, x2)
z1 = (z1 - z1.mean(0)) / z1.std(0)
z2 = (z2 - z2.mean(0)) / z2.std(0)
N = z1.shape[0]
c = (z1.T @ z2) / N
loss = -th.diagonal(c).sum()
loss.backward()
```

Code adapted from here.

This operations although they are not complex, they are highly non-linear (std involves squaring the term, we have a division and the derivative of the division is complex, etc.), and it would take a while to do it by hand. However, Pytorch does it automatically, and I’m trying to understand how this is possible.

I’ve tried to check what are the `grad_fn`

of `loss`

and `c`

and `z1`

variables and it involve some `MmBackward`

and `DivBackward`

, but I don’t really see how these can be done using Jacobian and gradient products. I was trying to find the code for these backward functions in Pytorch’s Github repository to see if it helps to understand it, but I’m not able to find it.

I would be really grateful if someone could tell me how this work, high level.

Thanks in advance