Custom Loss with detach/re-attach

welahi · March 22, 2022, 3:12pm

Hello,

I need to implement some kind of a “generalized MSE” loss, where I don’t just consider loss_mse(target, model), but loss_mse(O(target), O(model)), where O is a rather complicated function.

The problem is that O detaches the PyTorch-tensors as numpy, calls some different thrid-party libraries to calculate the result and returns again an array. Obviously, I can convert the resulting arrays again into tensors, but this won’t help, because I once detached them and the gradient can not be calculated anymore.

I am fairly new to PyTorch, so please tell me if my post needs further clarification. Does anyone know where to start? i would also be happy with a link to the right documentation page or tutorial, but I had a hard time searching myself

Thanks and best

KFrank · March 22, 2022, 7:02pm

Hi Welahi!

The best approach – if it’s doable – would be to rewrite your custom
loss O using (differentiable) pytorch tensor operations. Then you will
get autograd “for free.”

If this is not practical – maybe the stuff in your third-party libraries is
too complicated – you will have to write a custom autograd Function.

The forward() method of your Function can just be your current
O (returning pytorch tensors, of course). But for the backward()
method, you will have to work out the derivative (gradient) of O (or
at least a good-enough approximation to it) and implement it. Such
an implementation can use numpy and your third-party libraries, if
that helps (but has to be packaged to accept and return pytorch
tensors, of course).

This is all conceptually straightforward, but can become quite difficult
and nuanced, depending on how complicated O and its derivative are.

Good luck.

K. Frank

welahi · March 22, 2022, 11:29pm

Hi K. Frank,

thanks for your answer and the link! I’ve played around with the example and implemented another toy problem and I think I grasped now how it works.

Unfortunatelly, it is near-impossible for my actual to find any kind of “closed expression” for the derivative, so I’m probably going to perform a numerical derivation using the forward-operation itself (basically (O(x+h)-O(x-h))/(2h) for some small h). It’s far from perfect but I don’t see any other way

Best

KFrank · March 23, 2022, 2:56am

Hi Welahi!

Calculating the gradient numerically is a perfectly reasonable thing
to do, although it can become expensive in the multi-dimensional
case.

Note that the forward() method of your custom Function gets passed
a context object, ctx, that it can use to stash away useful information
that can be helpful to your custom Function’s backward() method.
Pytorch’s autograd machinery hands ctx back to your backward()
method during the backward pass.

It may be the case (or maybe not) that some (or maybe all) of the
heavy lifting needed to perform the numerical differentiation can be
carried out more cheaply or conveniently during the forward pass.

You should look for opportunities to reuse parts of the computations
performed during the forward pass if they show up again in the
backward pass and use the ctx object as the mechanism to do this
(at some cost in memory).

Best.

K. Frank

welahi · March 25, 2022, 3:09pm

Hi K. Frank!

Ah yes, but unfortunately I don’t see any possibility to make use of ctx in my case here. Except if I would use the forward difference (O(x+h)-O(x))/h instead of the central difference, since I could then re-use O(x), but I’m willing to sacrifice the time gain for better accuracy. At least for now

Anyway, it seems to run now. Although it was really a pain to figure out if PyTorch expects which of []-, [1]- or [1,1]-tensor under what circumstances…

Best