 Matrix multiplication along specific dimension

So I want to multiply 2 matrices that has dimensions:

torch.Size([10, 16, 240, 320])
torch.Size([10, 32, 240, 320])

now I want the output to be [10, 16, 32] (it will multiply the last 2 dimensions element-wise and sum them)

The code that generates the 2 metrics:

import torch

b = 10
h1 = 480
w1 = 640

h2 = 240
w2 = 320

m = 16
n = 32
# task 1: interpolate F1 [h1,w1] to [h2,w2] --> [h,w]
# task 2: multiply channel wise, sum them and divide by h*w, output-> [b,m,n]
F1 = torch.rand([b, m, h1, w1])
F2 = torch.rand([b, n, h2, w2])

F1 = torch.nn.functional.interpolate(F1, [h2, w2], mode = 'bicubic', align_corners = True)
out = torch.matmul(F1.view(-1, h2 * w2), F1.view(-1, h2 * w2) .t())

So, I did

out = torch.matmul(F1.view(-1, h2 * w2), F2.view(-1, h2 * w2) .t())
print(out.shape)

And the output shape is:

torch.Size([160, 320])

But it should be [10, 16, 32]

def FSP(layer1, layer2):
b = layer1.shape

h2 = layer2.shape
w2 = layer2.shape

m = layer1.shape
n = layer2.shape

mid = torch.nn.functional.interpolate(layer1, [h2, w2], mode='bicubic', align_corners=True)
F3 = torch.zeros(b, m, n)
for batch in range(b):
for i in range(m):
for j in range(n):
F3[batch, i, j] = torch.mean(torch.mul(mid[batch, i, :, :], layer2[batch, j, :, :]))

return F3

Actually, this is what I am trying to implement, but without using loops.