Groupby aggregate product in PyTorch

alvarogutyerrez · December 2, 2022, 3:33pm

I have the same problem as in the question Groupby aggregate mean in pytorch. However, I want to create the product of my tensors inside each group (or labels). Unfortunately, I couldn’t find a native PyTorch function that could solve my problem, like a hypothetical scatter_prod_ for products (equivalent to scatter_add_ for sums), which was the function used in @ptrblck’s answer.

Recycling the example code from @elyase’s question, consider the 2D tensor:

samples = torch.Tensor([
    [0.1, 0.1],    #-> group / class 1
    [0.2, 0.2],    #-> group / class 2
    [0.4, 0.4],    #-> group / class 2
    [0.0, 0.0]     #-> group / class 0
])

with labels where it is true that len(samples) == len(labels)

labels = torch.LongTensor([1, 2, 2, 0])

So my expected output is:

res == torch.Tensor([
    [0.0, 0.0],
    [0.1, 0.1], 
    [0.8, 0.8] # -> PRODUCT of [0.2, 0.2] and [0.4, 0.4]
])

Here the question is, again, following @elyase’s question, how can this be done in pure PyTorch (i.e. no numpy so that I can autograd) and ideally without for loops?

Crossposted in: python - groupby aggregate product in PyTorch - Stack Overflow

ptrblck · December 2, 2022, 7:35pm

scatter_ introduced the reduce argument and reduce='multiply' should work.

alvarogutyerrez · December 8, 2022, 10:44am

You are right. Indeed, this was posted on StackOverflow. Reposting for future searches.

samples = torch.Tensor([
    [0.1, 0.1],    #-> group / class 1
    [0.2, 0.2],    #-> group / class 2
    [0.4, 0.4],    #-> group / class 2
    [0.0, 0.0]     #-> group / class 0
])

labels = torch.LongTensor([1,2,2,0])

label_size = 3
sample_dim = samples.size(1)

index = labels.unsqueeze(1).repeat((1, sample_dim))

res = torch.ones(label_size, sample_dim, dtype=samples.dtype)
res.scatter_(0, index, samples, reduce='multiply')