# Batch element-wise dot-product of matrices and vectors

I asked a similar question about numpy in stackoverflow, but since I’ve discovered the power of the GPU since, I can’t go back there.

So I have a 3D tensor representing a list of matrices, e.g.:

``````In [112]: matrices
Out[112]:

(0 ,.,.) =
1  0  0  0  0
0  1  0  0  0
0  0  1  0  0
0  0  0  1  0
0  0  0  0  1

(1 ,.,.) =
5  0  0  0  0
0  5  0  0  0
0  0  5  0  0
0  0  0  5  0
0  0  0  0  5
[torch.cuda.FloatTensor of size 2x5x5 (GPU 0)]
``````

and a 2D tensor representing a list of vectors, e.g.:

``````In [113]: vectors
Out[113]:

1  1
1  1
1  1
1  1
1  1
[torch.cuda.FloatTensor of size 5x2 (GPU 0)]
``````

… and I need element-wise, gpu-powered dot product of these two tensors.

I would expect to be able to use `torch.bmm` here but I cannot figure out how, especially I don’t understand why this happens:

``````In [114]: torch.bmm(matrices, vectors.permute(1,0))
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-114-e348783370f7> in <module>()
----> 1 torch.bmm(matrices, vectors.permute(1,0))

RuntimeError: out of range at /py/conda-bld/pytorch_1490979338030/work/torch/lib/THC/generic/THCTensor.c:23
``````

… when `matrices[i] @ vectors.permute(1,0)[i]` work for any `i < len(matrices)`.

Thanks for your help…

Oh I’ve just found out something that works: `torch.bmm(matrices,vectors.permute(1,0).unsqueeze(2)).squeeze().permute(1,0)`.

So I have another question: is there any way to avoid these `permute`s and `[un]squeeze`? Should I organize my arrays differently?

There’s no way to avoid the permute calls, although you can use .t() (transpose) instead of permute(1, 0). Typically we have the left-most dimension as the batch dimension.

Greg is working on NumPy-style broadcasting which will make the unsqueeze calls unnecessary.