in one of the layers in forward pass of my model I am using tensor which requires grad in order to compute other tensor which I am later using as indices in scatter_add_

You can already see the problem as for this layer to make sense I absolutely need to continue this gradient, and on the other hand scatter_add_ expects indices to be integers. However, naturally, if I convert the tensor to dtype int, it breaks the gradient flow.

I would really appreciate any ideas/suggestions.
Thanks in advance

You simply can’t backpropagate through the computation of your indices.

Your indices – inherently integers – are discrete, so you can’t (usefully)
take their derivatives, even with respect to a tensor that has requires_grad = True.

Let’s say you have a floating-point tensor, x, and you compute an index
from it, compute_index (x) = 17. Let’s say that as you vary x for a while, compute_index (x) remains 17. For these values of x, the gradient of
the index with respect to x is zero – that is, the index is constant.

Now let’s say that you vary x a little more and the index jumps to 18, after
which it remains 18 for a while. Right at the jump, the gradient is technically
undefined (or perhaps inf, if you prefer). Then after the jump the gradient
is zero again.

So your gradient is zero almost everywhere and is undefined at some number
of isolated points. Although mathematically defined (for most values of x),
such a gradient – being zero (almost) everywhere – isn’t useful for
gradient-descent optimization.

I am aware of the problem with undefined derivative for discrete variable which integers are. I was hoping that this is an usual problem which maybe has a trick, but it seems I will have to think of an workaround.

Thank you very much for an extensive answer, I will use this opportunity to express that I really appreciate you beautiful knowledgeable people on this forum who take their time to answer us!