How to speed up the process

Hello all, I have a feature vector size of 128x5 and a corresponding index label 128x1. For example,

f =[f1, f2, f3, f4, f5]
labels =[2, 3, 2, 1, 2]

that means fi size of 1x5, and corresponding to label is j (i.e. f1, f3, and f5 have a label is 2, f2 has label 3

In my task, I want to compute the average feature of a label. For example for label 2 we have

f(1) = f4 
f(2) = (f1+f3 + f5)/3
f(3) = f2

Currently, my implementation loop all labels and compute the average, but the speed is very slow (when the size of the label increases to 5 Million). How should I speed up it? Thanks.

    num_freq_label = torch.zeros(len(labels))
    for ind in range (len(labels)):
         ind_label = (labels == ind).nonzero()
         f[ind:ind+1, ...] +=  torch.sum(f[ind_label], dim=0)
         num_freq_label[ind] +=ind_label.size(0)
    f = torch.div(f, num_freq_label.view(-1,1))