Consider a single linear layer as follows.
x
and w
are vectors of the same size, and will be the input / weight of this layer.
x = torch.rand(1, 63)
w = torch.rand(1, 63)
fc = Linear(63, 1, bias=False)
Suppose I evaluate the following block:
permute = np.random.permutation(63)
fc.load_state_dict({'weight': w}, strict=False)
val1 = fc(x)[0, 0].item()
fc.load_state_dict({'weight': w[:, permute]}, strict=False)
val2 = fc(x[:, permute])[0, 0].item()
print(val1)
print(val2)
Note that val2
is computed by applying the same permutation to the elements of x
and w
. Theoretically, val1
and val2
should have the same value, i.e. the dot product of x
and w
.
In practice, when I execute the block on Jupyter for multiple times with different permutations, val1
and val2
occasionally result in slightly different numbers, such as 14.554329872131348 vs 14.554328918457031.
(Pytorch ver: 1.10.0. Issue persists regardless of using CPU / GPU as device)
I would like to know why this situation occurs. Is it due to some sort of numerical stability issue, or are there some randomized factors that dictate how these layers are evaluated?