Efficient vectorized implementation of a loop

I have piece of code which is implemented using loops, which I was not able to implement in a vectorized manner.

for b in range(B):
    for i in range(M - 1):
        for j in range(i + 1, M):
            interaction = (vectors[b, fields[b, i], fields[b, j], :] * vectors[b, fields[b, j], fields[b, i], :]).sum()
            pairwise[b] += interaction

I tried using gather / scatter, but they support indexing only on one dimension. Is there an efficient way to implement it? fields is a BxM array, vectors is a BxMxMxD array. The result, pairwise, should be an array of size B.

Even without full vectorization, only vectorizing along the B dimension would already be great.