When to use Functional with Custom Loss Functions

Hi everyone -

TLDR; Is there a list of operations that have (or a list of operations that don’t have) backward defined?

There seem to be 3 options for implementing a custom loss function:

  1. Write a normal function
  2. Extend nn.Module where the forward method wraps the function you’d implement for 1
  3. Extend nn.autograd.Function where the forward method is as in 2 but you explicitly implement the backward method yourself.

My understanding is that 1and 2 are equivalent (though examples where you pass parameters 2 seems more natural).

Questions:

  • It seems that 3 is only useful if you are using operations where pytorch doesn’t define backward. Is there a list of operations that have (or a list of operations that don’t have) backward defined?
  • Is my description of 1-3 above correct or have I missed something
1 Like

Hi,

Yes your description is correct.
The main difference between 1 and 2 is that 2 is built with the nn package and handles nicely parameters and buffers.

3 is useful in few cases:

  • If an op is not yet implemented. But at the moment, all pytorch operations have a backward implemented (except one or two linear algebra operrations) so that should not happen.
  • Your operation’s true gradients are not what you want. For example you could have an operation non-smooth gradients for which you want smooth gradients to help training.
  • Your operation would be slow and memory consuming to backward using auto-diff and so you implement the backward by hand to reduce memory usage (and potentially speed it up depending on the op). This is particularly true if your forward contains a very large number of small ops.
1 Like

Thanks! That makes a lot of sense. I certainly not using them right now but if you know what the one or two linear algerbra ops that do not have backward implemented it would be great to know.

Thanks again.b