Try to conduct a self-made mask-like convolution on features, but got problems

Try to conduct a self-made mask-like convolution on features:

I now get a Tensor feature of size (Batch * 256 * 24 * 24), which is of 256 channels and 24 for height and width.
Also,
I get a mask Tensor of size ( 9 * 9 ), passed through the F.softmax operation so is to normalize.

Now, my purpose is to convolute this mask on the feature, using the sliding window fashion just as normal nn.Conv2d. But the difference is that this mask is shared by all 256 channels altogether, not like (256 * 9 * 9) kernels, since these are just “mask”

I have tried:

 for batch in range(b):
            for hh in range(h):
                for ww in range(w):
                    for i in range(9):
                         for j in range(9):
                                prop_f[batch,:,hh,ww]  +=   # multiply mask[i][j] and feature[batch,:, h,w]

after I write this code, I realize the for loops are just too many!!! and would drag the running speed to a very low level,
I believe there is some fast way , eg. using the parallel computing of matrix to accelerate, BUT I JUST DONT KNOW:tired_face::tired_face::sob::sob::sob::cry:

So, plz help me with this, THANKS A LOT!!!:handshake::handshake::handshake::handshake: