I have 2 tensors of the following dimensions:

```
A: n x i x o
B: n x b x i
```

and I would like to compute the tensor `C`

of dimension `n x b x o`

. Here, `n`

denotes the number of feature maps, `o`

is the output dimension, `i`

is the input dimension, and `b`

is the batch size.

Think of `A, B, C`

as stacks of matrices. The operation I’m looking for is essentially map-wise matrix multiplies.

What would be the most GPU-efficient way to express my computations.

Would

```
C = torch.einsum('nio,nbi->nbo', [A, B])
```

do the trick?

Would that be correct and reasonably efficient? If not, what’s a better alternative?

Note that I can change the orders of the dimensions to make the computation more efficient if necessary.