Optimizing loops

Hello,
I wander if someone can help me to find a better way to do the following things (ie. I’m not a python geek)
Let say that in a forward method of my model I get a batch of image of size N,Cin,H,W, then I make a treatment that produce a tensor of dim; N, Cin, K, H, W let us call it ‘x’
with K=1+J*L+L**2 J(J-1)/2 for some J and L values.

Now, I compute some means values on the two last dims (H,W)

meanCoeff= torch.mean(x,axis=(3,4))

Then, I proceed like that

xnew=torch.zeros_like(x)
#lvl 0
xnew[:,:,0,:,:] = x[:,:,0,:,:]
#lvl 1
for j1 in range(0,J):
     for t1 in range(0,L):
           i=adict[(j1,t1)]
           xnew[:,:,i,:,:] = x[:,:,i,:,:]-meanCoeffs[:,:,i,None,None]
#lvl 2
for j1 in range(0,J-1):
     for t1 in range(0,L):
           i1=adict[(j1,t1)]
          for j2 in range(j1+1,J):
               for t2 in range(0,L):
                    i12=adict[(j1,t1,j2,t2)]
                    xnew[:,:,i12,:,:] =x[:,:,i12,:,:]-meanCoeffs[:,:,i1,None,None] 

where in case of J=2, L=4 adict is a dictionary equals to

adict={(-1,): 0,
 (0, 0): 1,
 (0, 1): 2,
 (0, 2): 3,
 (0, 3): 4,
 (1, 0): 5,
 (1, 1): 6,
 (1, 2): 7,
 (1, 3): 8,
 (0, 0, 1, 0): 9,
 (0, 0, 1, 1): 10,
 (0, 0, 1, 2): 11,
 (0, 0, 1, 3): 12,
 (0, 1, 1, 0): 13,
 (0, 1, 1, 1): 14,
 (0, 1, 1, 2): 15,
 (0, 1, 1, 3): 16,
 (0, 2, 1, 0): 17,
 (0, 2, 1, 1): 18,
 (0, 2, 1, 2): 19,
 (0, 2, 1, 3): 20,
 (0, 3, 1, 0): 21,
 (0, 3, 1, 1): 22,
 (0, 3, 1, 2): 23,
 (0, 3, 1, 3): 24}

My question is : is there a better way to compute xnew (eg. vectorization) especially on cuda device (of course)

Thanks