Consider a single linear layer as follows.

`x`

and `w`

are vectors of the same size, and will be the input / weight of this layer.

```
x = torch.rand(1, 63)
w = torch.rand(1, 63)
fc = Linear(63, 1, bias=False)
```

Suppose I evaluate the following block:

```
permute = np.random.permutation(63)
fc.load_state_dict({'weight': w}, strict=False)
val1 = fc(x)[0, 0].item()
fc.load_state_dict({'weight': w[:, permute]}, strict=False)
val2 = fc(x[:, permute])[0, 0].item()
print(val1)
print(val2)
```

Note that `val2`

is computed by applying the same permutation to the elements of `x`

and `w`

. Theoretically, `val1`

and `val2`

should have the same value, i.e. the dot product of `x`

and `w`

.

In practice, when I execute the block on Jupyter for multiple times with different permutations, `val1`

and `val2`

occasionally result in slightly different numbers, such as 14.554329872131348 vs 14.554328918457031.

(Pytorch ver: 1.10.0. Issue persists regardless of using CPU / GPU as device)

I would like to know why this situation occurs. Is it due to some sort of numerical stability issue, or are there some randomized factors that dictate how these layers are evaluated?