Is there a way to do categorical distribution sampling on the gpu

Hi , i just started looking into pytorch today to see if i could speed up my research project. I was wondering if there was a way to sample N*M unique categorical distributions in parallel through pytorch on a gpu. Below is a sample of my current code

# initialise
num_sets = 2000
num_dimensions = 800
num_distributions = 50

# create a distribution for ever dimension of every dimension (there will be unique distributions in practice)
sets = [torch.full((num_dimensions, num_distributions), 1 / num_distributions
                   , dtype=torch.float)] * num_sets

#to store sample results
set_sampled = [torch.zeros([num_dimensions], dtype=torch.float)] * num_sets

# %%

# function to sample all dimensions
def sample_set(dimensions, sampled_dimensions):
    i = 0
    for element in dimensions:
        rand_generator = torch.distributions.categorical.Categorical(element)
        sampled_dimensions[i] = rand_generator.sample()
        i += 1


time1 = time.process_time()
i = 0
for element in sets:
    sample_set(element, set_sampled[i])

time2 = time.process_time()


this is one of the slowest parts of my code hence it would be great if i could speed it up

Yes the regular Categorical distribution supports sampling multiple dimensions simultaneously. See the .expand() and .sample(sample_shape=...) methods.