Weird precision problem with linear layer

I’m using a linear layer to reduce the dimension of input tensors.
I have two input tensors converted from numpy arrays, these two arrays:

  1. they are generated by a same function
  2. ‘a1.npy’ is generated lonely
  3. ‘b1.npy’ is generated with other data
  4. a1 equals to b1 ( I tried a lot of methods to compare them)

Then, I convert these two arrays to torch tensor and copy them to cuda.
Finally, I pass them to a linear layer, and the results are different!
The difference is not like machine precision error, as they are bigger.

When I try to reproduce this with Google Colab GPU T4, the two results become the same again… But this phenomenon actually happened on my local machine.

Can anyone explain this?

My environment:

  • torch 1.10.2+cu113
  • ubuntu 20.0
  • GPU: rtx3090, rtx4090

To reproduce:
download data

import numpy as np
import torch

a1 = np.load('a1.npy')
b1 = np.load('b1.npy')
t1 = torch.tensor(a1).cuda()
t2 = torch.tensor(b1).cuda()
print('raw data diff: ', (t1 != t2).sum())

mlp = torch.load('linear.ckpt')

r1 = mlp(t1)
r2 = mlp(t2)

print('result diff:', (r1 != r2).sum())

# output
# >>> raw data diff: tensor(0, device='cuda:0')
# >>> result diff: tensor(127583, device='cuda:0')

Hi XFeiF!

Perhaps because a1 and b1 are created somewhat differently, their elements
are stored differently internally even though they are equal.

Consider (in pytorch, not numpy):

>>> import torch
>>> torch.__version__
>>> t1 = torch.tensor ([[1, 2, 3], [4, 5, 6]])
>>> t2 = torch.tensor ([[1, 4], [2, 5], [3, 6]]).T
>>> torch.equal (t1, t2)
>>> t1.stride()
(3, 1)
>>> t2.stride()
(1, 2)

The stride() is telling you that t1 is stored in row-major order, while t2 is
in column-major order.

For efficiency, when converting to Tensor (by using from_numpy()), pytorch
wraps the original numpy data in a Tensor, so the underlying (difference in)
storage order will be preserved.

I assume, but haven’t checked, that the .stride() of a cpu tensor is preserved
when you move it to the gpu.

If so, your two cuda tensors, even though equal, will have differing storage
order. This could well cause the gpu to perform operations in the Linear’s
matrix multiplications in a different order, leading to round-off error.

Check the .stride() of your gpu tensors. (Also check the .data_offset().)

Are you sure? How have you verified that the difference is not round-off
error (that could be rather larger than machine precision if it accumulated
in, say, a big matrix multiplication)?

In what you’ve posted, you’ve only shown that r1 and r2 have lots of
elements that aren’t exactly equal, perhaps differing by only some small
round-off error.

Try looking at torch.allclose (r1, r2, atol = 1.e-3) (or maybe even
atol = 1.e-2) and (r1 - r2).abs().max().

If this doesn’t sort things out, could you post a complete, full-self-contained,
runnable script (presumably using both numpy and pytorch) that reproduces
your issue? Please use randomly-generated or hard-coded data produced
inside the script itself. Please also post the results you get from running the


K. Frank

1 Like

Thank you so much for your prompt and helpful response :heart:.
Your solution to my question worked perfectly and resolved the issue I was facing.

After I checked the stride of t1(a1) and t2(b1), I found that they are stored differently internally! One is stored in row-major order and the other is stored in column-major order.

I used the NumPy functions ascontiguousarray or asfortranarray to switch the stride of the original NumPy arrays.
Finally, the outputs of the mlp layer are now the same! :face_with_peeking_eye: