Dear Pytorch community,
in one of the layers in forward pass of my model I am using tensor which requires grad in order to compute other tensor which I am later using as indices in scatter_add_
You can already see the problem as for this layer to make sense I absolutely need to continue this gradient, and on the other hand scatter_add_ expects indices to be integers. However, naturally, if I convert the tensor to dtype int, it breaks the gradient flow.
I would really appreciate any ideas/suggestions.
Thanks in advance
You simply can’t backpropagate through the computation of your indices.
Your indices – inherently integers – are discrete, so you can’t (usefully)
take their derivatives, even with respect to a tensor that has
requires_grad = True.
Let’s say you have a floating-point tensor,
x, and you compute an index
compute_index (x) = 17. Let’s say that as you vary
x for a while,
compute_index (x) remains
17. For these values of
x, the gradient of
the index with respect to
x is zero – that is, the index is constant.
Now let’s say that you vary
x a little more and the index jumps to
which it remains
18 for a while. Right at the jump, the gradient is technically
undefined (or perhaps
inf, if you prefer). Then after the jump the gradient
is zero again.
So your gradient is zero almost everywhere and is undefined at some number
of isolated points. Although mathematically defined (for most values of
such a gradient – being zero (almost) everywhere – isn’t useful for
I am aware of the problem with undefined derivative for discrete variable which integers are. I was hoping that this is an usual problem which maybe has a trick, but it seems I will have to think of an workaround.
Thank you very much for an extensive answer, I will use this opportunity to express that I really appreciate you beautiful knowledgeable people on this forum who take their time to answer us!