Pytorch is allowing incorrect matrix multiplication when using cuda

matthewleigh · August 30, 2021, 4:15pm

Hey everyone,

I have a minor issue and I am not sure if it is a bug or I am simply not understanding something. But I noticed that in my project I was getting some strange results for certain configurations and error messages not popping up when expected.

To boil it down, I am allowed to multiply matrices of incompatible sizes so long as it takes place on the gpu.

When running the following:

input = torch.randn(5, 10)
network = nn.Linear(3, 3)

output = network(input) ## This should fail

I get the expected error:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x10 and 3x3)

However, the following runs without errors:

input = torch.randn(5, 10).cuda()
network = nn.Linear(3, 3)
network.cuda()

output = network(input) ## This should fail

It gives me an output with shape (5,3).
In fact I can run with input shape (5, 8793289) and it will still work and I will still get a output with shape (5,3).

So what is happening here?

KFrank · August 30, 2021, 6:42pm

Hi Matthew!

I would say this is a bug (@ptrblck) – in any event it violates the
principle of least surprise.

I can reproduce this on version 1.9.0:

>>> import torch
>>> torch.__version__
'1.9.0'
>>> torch.nn.Linear (3, 3) (torch.randn (5, 10))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<path_to_pytorch>/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "<path_to_pytorch>/torch/nn/modules/linear.py", line 96, in forward
    return F.linear(input, self.weight, self.bias)
  File "<path_to_pytorch>/torch/nn/functional.py", line 1847, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x10 and 3x3)
>>> torch.nn.Linear (3, 3).cuda() (torch.randn (5, 10).cuda())
tensor([[-0.2980, -0.2856,  0.2935],
        [-0.3196, -0.7825, -0.0424],
        [-0.0242, -0.7375,  0.1639],
        [-0.9905, -0.4888,  0.2138],
        [-0.8091, -0.5870,  0.1135]], device='cuda:0', grad_fn=<AddmmBackward>)

Best.

K. Frank

ptrblck · August 30, 2021, 6:57pm

Thanks for pinging! Yes, this was a known issue, which should have been fixed in the nightlies. Could you install the current nightly binary and rerun your code, please?

KFrank · August 31, 2021, 5:04am

Hi @ptrblck!

I can confirm that the issue has been fixed in the current nightly (that
is, the cuda version also throws the shape-mismatch error):

>>> torch.__version__
'1.10.0.dev20210830'
>>> torch.nn.Linear (3, 3).cuda() (torch.randn (5, 10).cuda())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<path_to_pytorch>/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "<path_to_pytorch>/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "<path_to_pytorch>/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x10 and 3x3)

As an aside, this issue had not yet been fixed in the older nightly,
version 1.10.0.dev20210624.

Best.

K. Frank

matthewleigh · August 31, 2021, 1:22pm

I can confirm that the correct error is displayed with the new nightly release. Thanks