Autograd of scatter_add_

KFrank · January 11, 2023, 9:18pm

Hi IZugec!

You simply can’t backpropagate through the computation of your indices.

Your indices – inherently integers – are discrete, so you can’t (usefully)
take their derivatives, even with respect to a tensor that has
requires_grad = True.

Let’s say you have a floating-point tensor, x, and you compute an index
from it, compute_index (x) = 17. Let’s say that as you vary x for a while,
compute_index (x) remains 17. For these values of x, the gradient of
the index with respect to x is zero – that is, the index is constant.

Now let’s say that you vary x a little more and the index jumps to 18, after
which it remains 18 for a while. Right at the jump, the gradient is technically
undefined (or perhaps inf, if you prefer). Then after the jump the gradient
is zero again.

So your gradient is zero almost everywhere and is undefined at some number
of isolated points. Although mathematically defined (for most values of x),
such a gradient – being zero (almost) everywhere – isn’t useful for
gradient-descent optimization.

Best.

K. Frank