# Batch element-wise dot-product of matrices and vectors

I asked a similar question about numpy in stackoverflow, but since I’ve discovered the power of the GPU since, I can’t go back there.

So I have a 3D tensor representing a list of matrices, e.g.:

``````In : matrices
Out:

(0 ,.,.) =
1  0  0  0  0
0  1  0  0  0
0  0  1  0  0
0  0  0  1  0
0  0  0  0  1

(1 ,.,.) =
5  0  0  0  0
0  5  0  0  0
0  0  5  0  0
0  0  0  5  0
0  0  0  0  5
[torch.cuda.FloatTensor of size 2x5x5 (GPU 0)]
``````

and a 2D tensor representing a list of vectors, e.g.:

``````In : vectors
Out:

1  1
1  1
1  1
1  1
1  1
[torch.cuda.FloatTensor of size 5x2 (GPU 0)]
``````

… and I need element-wise, gpu-powered dot product of these two tensors.

I would expect to be able to use `torch.bmm` here but I cannot figure out how, especially I don’t understand why this happens:

``````In : torch.bmm(matrices, vectors.permute(1,0))
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-114-e348783370f7> in <module>()
----> 1 torch.bmm(matrices, vectors.permute(1,0))

RuntimeError: out of range at /py/conda-bld/pytorch_1490979338030/work/torch/lib/THC/generic/THCTensor.c:23
``````

… when `matrices[i] @ vectors.permute(1,0)[i]` work for any `i < len(matrices)`.

Oh I’ve just found out something that works: `torch.bmm(matrices,vectors.permute(1,0).unsqueeze(2)).squeeze().permute(1,0)`.
So I have another question: is there any way to avoid these `permute`s and `[un]squeeze`? Should I organize my arrays differently?