Hi all
I was trying to understand how does the autograd
module and backward
function work. In some projects, I’ve seen non-linear and non-convex operations like the following:
z1, z2 = model(x1, x2)
z1 = (z1 - z1.mean(0)) / z1.std(0)
z2 = (z2 - z2.mean(0)) / z2.std(0)
N = z1.shape[0]
c = (z1.T @ z2) / N
loss = -th.diagonal(c).sum()
loss.backward()
Code adapted from here.
This operations although they are not complex, they are highly non-linear (std involves squaring the term, we have a division and the derivative of the division is complex, etc.), and it would take a while to do it by hand. However, Pytorch does it automatically, and I’m trying to understand how this is possible.
I’ve tried to check what are the grad_fn
of loss
and c
and z1
variables and it involve some MmBackward
and DivBackward
, but I don’t really see how these can be done using Jacobian and gradient products. I was trying to find the code for these backward functions in Pytorch’s Github repository to see if it helps to understand it, but I’m not able to find it.
I would be really grateful if someone could tell me how this work, high level.
Thanks in advance