I’m using a linear layer to reduce the dimension of input tensors.

I have two input tensors converted from numpy arrays, these two arrays:

- they are generated by a same function
- ‘a1.npy’ is generated lonely
- ‘b1.npy’ is generated with other data
- a1 equals to b1 ( I tried a lot of methods to compare them)

Then, I convert these two arrays to torch tensor and copy them to cuda.

Finally, I pass them to a linear layer, and the results are different!

The difference is not like machine precision error, as they are bigger.

When I try to reproduce this with Google Colab GPU T4, the two results become the same again… But this phenomenon actually happened on my local machine.

Can anyone explain this?

My environment:

- torch 1.10.2+cu113
- ubuntu 20.0
- GPU: rtx3090, rtx4090

To reproduce:

download data

```
import numpy as np
import torch
a1 = np.load('a1.npy')
b1 = np.load('b1.npy')
t1 = torch.tensor(a1).cuda()
t2 = torch.tensor(b1).cuda()
print('raw data diff: ', (t1 != t2).sum())
mlp = torch.load('linear.ckpt')
mlp.eval()
r1 = mlp(t1)
r2 = mlp(t2)
print('result diff:', (r1 != r2).sum())
# output
# >>> raw data diff: tensor(0, device='cuda:0')
# >>> result diff: tensor(127583, device='cuda:0')
```