Trying to matmul a (1,4,2) and a (2,4,2) tensor - why is the 2nd expected to have dimension (2,2,2))

Hello,

I want to execute this to understand how matmul works - i know it should throw an error.

import torch.nn
import numpy as np
data = [[[1, 2],[3, 4],[5,6],[7,8]]]
data1 = [[[1, 2],[3, 4],[5,6],[7,8]],[[1, 2],[3, 4],[5,6],[7,8]]]
t0 = torch.tensor(data)
t1 = torch.tensor(data1)
print(f"{t0.shape}  {t1.shape}")
print(f"{t0 @ t1}")

what I do not understand is why the error message says

RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [2, 2] but got: [2, 4].

specifically why is dimension (2,2) in the first two entries expected?

Hi Make!

A quick overview of matmul():

When the tensors being multiplied have more than two dimensions, the leading
dimensions (that is, all but the last two) are treated as batch dimensions and need
to be broadcastable with one another. In your case the leading dimensions are
[1] and [2], which are broadcastable.

The final two dimensions are the matrix dimensions. In your case, these are [4, 2]
and [4, 2]. However, for matrix multiplication, the second dimension of the first
matrix much match the first dimension of the second matrix. That is, a matrix A of
shape [m, o] and a matrix B of shape [o, n] can be multiplied together to produce
the matrix A @ B of shape [m, n]. This is because A and B share the dimension
o in the appropriate locations.

The matrix dimensions (the last two dimensions) of your t0 and t1 don’t match up
correctly for matrix multiplication. But if we swap the last two dimensions of t1 they
will.

Consider:

>>> import torch
>>> print (torch.__version__)
2.6.0+cu126
>>> 
>>> data = [[[1, 2],[3, 4],[5,6],[7,8]]]
>>> data1 = [[[1, 2],[3, 4],[5,6],[7,8]],[[1, 2],[3, 4],[5,6],[7,8]]]
>>> t0 = torch.tensor(data)
>>> t1 = torch.tensor(data1)
>>> t1p = t1.permute (0, 2, 1)
>>> 
>>> print(f"{t0.shape}  {t1.shape}  {t1p.shape}")
torch.Size([1, 4, 2])  torch.Size([2, 4, 2])  torch.Size([2, 2, 4])
>>> 
>>> print(f"{(t0 @ t1p).shape}")
torch.Size([2, 4, 4])

Assuming that t0 is as given and that the first dimension of t1 happens to be 2,
then for matrix multiplication to work out, the second dimension of t1 (its first matrix
dimension) must be 2 to match the last dimension of t0, so indeed the first two
dimensions of t1 would be expected to be [2. 2].

I will say that the error message is not as clear as it could be. For example, t1 could
have shape [7, 2, 13], and the broadcasted, batched matrix multiplication would work
fine. So the required condition is that the last dimension of t0 equals the next-to-last
dimension of t1 (and that the batch dimensions be broadcastable).

Best.

K. Frank

thanks a lot for your efforts and thorough reply!