I noticed incorrect behavior in our project and I’ve done following experiment shows the CUDA result surprisingly inconsistent for identical function and inputs.
It would be appreciated if anyone knows the workaround.
My system:
pytorch 0.4.0
cuda 10.1
python 3.5.2
I knew this is not a recommended combination but our project does not compatible to newer versions of pytorch that we have to stick to an earlier version of pytorch.
In [1]: import torch
In [2]: from torch import nn
In [3]: net = nn.Linear(2, 1).cuda()
data
In [4]: data = torch.randn(16, 2).cuda()
In [5]: ou1 = net(data)
In [6]: ou2 = net(data)
In [7]: ou1 - ou2
Out[7]:
tensor([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]], device='cuda:0', grad_fn=<SubBackward0>)
This can either be a bug that has been fixed or a install/hardware issue.
Do you see the same thing when running on a different machine?