Is there a way to optimize/vectorize this pytorch operation?

I have two tensors, A and B , and the values at corresponding indices A[i][j] and B[i][j] parametrize a distribution ( Kpdf below). I’d like to, for every index, generate n_samples samples (see samples , which are generated from noise using T_ ) and calculate a statistic ( kl_ ). Then I want to sum up all of these statistics and return the result.

Is there a good way to do this? Here is the code I have so far:

eps = 1e-12

T_ = lambda x, a, b: torch.pow(1 - torch.pow(1-x,1/b), 1/a)
Kpdf = lambda x, a, b: a * b * torch.pow(x,a-1) * torch.pow((1-torch.pow(x,a)), b-1)

kl = torch.tensor([0.0], dtype=torch.float, requires_grad=True).to(A.device)

for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        a = A[i][j]
        b = B[i][j]
        noise = torch.FloatTensor(n_samples).uniform_(0, 1).to(A.device)
        samples = T_(noise, a, b)
        kl_ = torch.log(Kpdf(samples, a, b)/(eps + prior(samples)))
        kl = kl + kl_.sum()       

return kl

Yes, generally, you can use (i,j,1) shapes for distribution parameters, (i,j,n_samples) for noise & samples. With that, you have expanded everything and can reduce results as needed.