# Performance bottleneck when normalizing groups of elements in a tensor

I have the following bottleneck in my model:
I need to normalize groups of elements in my input tensors. The groups of elements that need to be normalized together are given by lists. Those lists are specific to a given layer in the model and do not change when another input tensor is treated.

In the example below, the input tensor `ln_input` is of shape `[16,14]`, and there are 2 groups of elements to normalize using softmax:

1. Elements:`[0,0],[0,1],[1,0],[1,1]`
2. Elements:`[10,10],[10,11]`

Those 2 groups are each saved as a list of tuples in the dictionary `coord_mapping`.

`coord_mapping` is fixed, while there are many different `ln_input` for which I need to repeat the operation. Therefore any expensive modification of `coord_mapping` data structure is acceptable.

Here is a minimal working example:

``````import torch
import torch.nn as nn
from collections import defaultdict

## normalization function
softmax0 = nn.Softmax(dim=0)

## input tensor
h_in = 16               # input height
w_in = 14               # input width

ln_input = torch.zeros([h_in,w_in])
output = torch.zeros_like(ln_input)

## groups of elements to normalize together
coord_mapping = defaultdict(list)
coord_mapping[1].append((0,0))
coord_mapping[1].append((0,1))
coord_mapping[1].append((1,0))
coord_mapping[1].append((1,1))

coord_mapping[2].append((10,10))
coord_mapping[2].append((10,11))

## inefficient normalization
for _, value in coord_mapping.items():
h_list, w_list = zip(*value)
output[h_list, w_list] = softmax0(ln_input[h_list, w_list])

## those groups of elements each sum to 1
print(output[0,0],output[0,1],output[1,0],output[1,1])
print(output[10,10],output[10,11])
``````

How could I make this more efficient?