Hello all, I have a feature vector size of `128x5`

and a corresponding index label `128x1`

. For example,

```
f =[f1, f2, f3, f4, f5]
labels =[2, 3, 2, 1, 2]
```

that means `fi`

size of 1x5, and corresponding to label is `j`

(i.e. f1, f3, and f5 have a label is 2, f2 has label 3

In my task, I want to compute the average feature of a label. For example for label 2 we have

```
f(1) = f4
f(2) = (f1+f3 + f5)/3
f(3) = f2
```

Currently, my implementation loop all labels and compute the average, but the speed is very slow (when the size of the label increases to 5 Million). How should I speed up it? Thanks.

```
num_freq_label = torch.zeros(len(labels))
for ind in range (len(labels)):
ind_label = (labels == ind).nonzero()
f[ind:ind+1, ...] += torch.sum(f[ind_label], dim=0)
num_freq_label[ind] +=ind_label.size(0)
f = torch.div(f, num_freq_label.view(-1,1))
```