Inconsistent results of forwarding a cpu/gpu tensor with a cpu model

kaixin · January 12, 2021, 6:27am

Hi, I found that forwarding the same tensor on cpu and gpu with a model on cpu yield different results.

import torch
fc = torch.nn.Linear(2, 2)
t = torch.tensor([[0.5, 0.3]])
print(fc(t))
print(fc(t.cuda()))

The result of fc(t) is correct while the result of fc(t.cuda()) is wrong. After a little digging, I found it is torch.addmm() that cause this issue. Though forwarding a gpu tensor with a cpu model generally makes little sense, my expectation is that pytorch can either throw an error (saying input device mismatch) or give the correct result.

user_123454321 · January 12, 2021, 12:46pm

This is interesting, I would expect it to throw error as well (and it did when I tried). Can you mention the pytorch version and the machine info.

kaixin · January 12, 2021, 1:39pm

@user_123454321 my pytorch version is 1.6.0 and OS is ubuntu 16.04. I tried different GPU models but the issue persists. By the way, may I know your pytorch version?

Some updates: I found the output fc(t.cuda()) equals to the bias of the Linear layer. When I set bias=False, calling fc(t.cuda()) will raise an error RuntimeError: copy_if failed to synchronize: an illegal memory access was encountered. I guess it is probably related to the implementation of torch.addmm() but could not find out.

user_123454321 · January 12, 2021, 1:46pm

Hmmm…my pytorch version is 1.7.1, OS is ubuntu 18.0.4 and P100 GPU. I still don’t get why data on gpu is able to be run on model on cpu.

kaixin · January 12, 2021, 2:49pm

I managed to try both 1.7.0 and 1.6.0 on colab and it seems the issue is with 1.6.0. Pytorch 1.7.0 can raise error as expected.

user_123454321 · January 12, 2021, 4:27pm

Yes, it seems like the error is in 1.6.0 (tried in the server). Tried 1.5.0, that gives expected results too.

1.6.0

>>> import torch
>>> torch.__version__
'1.6.0'
>>> fc = torch.nn.Linear(2, 2)
>>> t = torch.tensor([[0.5, 0.3]])
>>> print(fc(t))
tensor([[-0.1817,  0.4468]], grad_fn=<AddmmBackward>)
>>> print(fc(t.cuda()))
tensor([[0.1775, 0.5823]], grad_fn=<AddmmBackward>)
>>>

1.5.0

>>> import torch
>>> torch.__version__
'1.5.0'
>>> fc = torch.nn.Linear(2, 2)
>>> t = torch.tensor([[0.5, 0.3]])
>>> print(fc(t))
tensor([[0.0865, 0.0900]], grad_fn=<AddmmBackward>)
>>> print(fc(t.cuda()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/anaconda3/envs/pytorch1.5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch1.5/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/ubuntu/anaconda3/envs/pytorch1.5/lib/python3.8/site-packages/torch/nn/functional.py", line 1610, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm
>>>