Specialized scatter operations for batched data

Stannislav · June 15, 2020, 12:42pm

Hi,

PyTorch provides the scatter_add operation, which can be very useful, but unfortunately is not deterministic. A deterministic mode for this operation is AFAIK not yet implemented (see this issue).

While the built-in scatter_add is very general, I’m wondering if there are solutions for the special case where the index tensor is sorted. This could be a very relevant usecase where the index is just a segmentation/mask for variable-sized batches that were stacked together.

I’m thinking of something like this:

>>> batch_tensor = torch.tensor([1, 2, 3, 4, 5])
>>> index_tensor = torch.tensor([0, 0, 0, 1, 1])

# either
>>> scatter_add(batch_tensor, index_tensor)
tensor([6, 9])

# or
>>> torch.zeros(2).scatter_add(batch_tensor, index_tensor)
tensor([6, 9])

we have a batch with 2 elements, [1, 2, 3] and [4, 5], and we’re doing a sum global pool.

It seems there already are thoughts about sorted indices out there (e.g. point 3 here), but I don’t have enough insight to assess the current status. I’m wondering:

Is there already something like this in torch?
Is there already something like this in cuda?
If not, how straightforward is it to implement this?

Sure one can implement this directly in python by iterating over the index array, but I guess this would be painfully slow and there must be a better way that leverages the GPU.

I think a solution to this would have an immediate benefit for graph neural networks, but potentially also for other areas like CNNs with variable-sized images.

Stan.