Processing embeddings and similarity scores - a better way to handle using tensors

alywonder · March 10, 2023, 5:25pm

I have a 100x100 tensor of similarity scores of 100 sentences I calculated. I also have a dictionary of 5 categories with a list of indexes showing which rows and columns in the similarity score tensor belong to each category.

scores.shape

>> torch.Size([100, 100])

cat_index

{ 'Album': [14, 31, 34, 87],
  'Animal': [21, 85, 99, 10],
  ...
  'Artist': [12, 15, 32, 45, 46, 48, ...]
}

I want to find the average score by category. That means I want to find the average by rows and columns based on the indexes for each category. The 100x100 tensor will be converted to 5x5.

I have done this by individually indexing and applying mean. But is there a better and more idiomatic way to do this with tensors? Will torch.nn.functional.embedding_bag() help?

Any help is much appreciated and thanks in advance.