Hi there,

I would like to compute an operation similar to the the scatter_add in TensorFlow. I’ll report in the following a concrete example to explain what I want to achieve. In particular, given the following tensor whose dimension is (batch_size, seq_length):

```
ids = [
[1, 0, 0, 0],
[4, 5, 0, 0],
[10, 0, 0, 0]
]
scores = [
[10.0, 0.0, 0.0, 0.0],
[4.977129936218262, 5.0228705406188965, 0.0, 0.0],
[10.0, 0.0, 0.0, 0.0, ]
]
```

I want to use `ids`

so as to transfer the values in each row of `scores`

in the corresponding position of a bigger tensor whose dimension is (batch_size, total_size). Each element in `ids`

goes from 0 to total_size-1. Imagine that in this case total_size is 10 and batch_size is 3, we will get the tensor:

```
transformed_scores = [
[0.0, 10.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 4.977129936218262, 5.0228705406188965, 0.0, 0.0, 0.0, 0.0],
[0.0, 10.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
]
```

I would like that this operation is differentiable so that I can backpropagate gradients for the intermediate representations.

Thanks!