# Weird precision problem with linear layer

I’m using a linear layer to reduce the dimension of input tensors.
I have two input tensors converted from numpy arrays, these two arrays:

1. they are generated by a same function
2. ‘a1.npy’ is generated lonely
3. ‘b1.npy’ is generated with other data
4. a1 equals to b1 ( I tried a lot of methods to compare them)

Then, I convert these two arrays to torch tensor and copy them to cuda.
Finally, I pass them to a linear layer, and the results are different!
The difference is not like machine precision error, as they are bigger.

When I try to reproduce this with Google Colab GPU T4, the two results become the same again… But this phenomenon actually happened on my local machine.

Can anyone explain this?

My environment:

• torch 1.10.2+cu113
• ubuntu 20.0
• GPU: rtx3090, rtx4090

To reproduce:

``````import numpy as np
import torch

t1 = torch.tensor(a1).cuda()
t2 = torch.tensor(b1).cuda()
print('raw data diff: ', (t1 != t2).sum())

mlp.eval()

r1 = mlp(t1)
r2 = mlp(t2)

print('result diff:', (r1 != r2).sum())

# output
# >>> raw data diff: tensor(0, device='cuda:0')
# >>> result diff: tensor(127583, device='cuda:0')
``````

Hi XFeiF!

Perhaps because `a1` and `b1` are created somewhat differently, their elements
are stored differently internally even though they are equal.

Consider (in pytorch, not numpy):

``````>>> import torch
>>> torch.__version__
'2.1.0'
>>> t1 = torch.tensor ([[1, 2, 3], [4, 5, 6]])
>>> t2 = torch.tensor ([[1, 4], [2, 5], [3, 6]]).T
>>> torch.equal (t1, t2)
True
>>> t1.stride()
(3, 1)
>>> t2.stride()
(1, 2)
``````

The `stride()` is telling you that `t1` is stored in row-major order, while `t2` is
in column-major order.

For efficiency, when converting to `Tensor` (by using `from_numpy()`), pytorch
wraps the original numpy data in a `Tensor`, so the underlying (difference in)
storage order will be preserved.

I assume, but haven’t checked, that the `.stride()` of a cpu tensor is preserved
when you move it to the gpu.

If so, your two cuda tensors, even though equal, will have differing storage
order. This could well cause the gpu to perform operations in the `Linear`’s
matrix multiplications in a different order, leading to round-off error.

Check the `.stride()` of your gpu tensors. (Also check the `.data_offset()`.)

Are you sure? How have you verified that the difference is not round-off
error (that could be rather larger than machine precision if it accumulated
in, say, a big matrix multiplication)?

In what you’ve posted, you’ve only shown that `r1` and `r2` have lots of
elements that aren’t exactly equal, perhaps differing by only some small
round-off error.

Try looking at `torch.allclose (r1, r2, atol = 1.e-3)` (or maybe even
`atol = 1.e-2`) and `(r1 - r2).abs().max()`.

If this doesn’t sort things out, could you post a complete, full-self-contained,
runnable script (presumably using both numpy and pytorch) that reproduces
inside the script itself. Please also post the results you get from running the
script.

Best.

K. Frank

1 Like

After I checked the `stride` of `t1(a1)` and `t2(b1)`, I found that they are stored differently internally! One is stored in row-major order and the other is stored in column-major order.
I used the NumPy functions `ascontiguousarray` or `asfortranarray` to switch the stride of the original NumPy arrays.
Finally, the outputs of the `mlp` layer are now the same!