Multiple indices of specific value

Hello,
I would like to get in a tensor of shape (Nb_of_values, Nb_max_indices) all the indices of specific values from a list without using “for loop”.

Bellow you can find my script. In order to optimize it on GPU I wish to avoid the use of a loop.
Is there a pytorch function for that?
Does somebody have an idea?

Thank you in advance.

import torch

max_idx = 6
data_idx = torch.tensor([2, 5, 5, 0, 4, 1, 4, 5, 3, 2, 1, 0, 3, 3, 0]).cuda()

max_number_data_idx = data_idx.shape[0]

filled_matrix = torch.zeros([max_idx, max_number_data_idx], dtype=torch.int8, device='cuda')
filled_matrix.fill_(-1)
for i in range(max_idx):
   same_idx = (data_idx == i).nonzero().flatten()
   filled_matrix[i][:same_idx.shape[0]] = same_idx