Shuffling the input before the model and shuffling the output after the model being not consistent on CUDA

nn.BatchNorm2d layers will take the stats from dims [0, 2, 3] and indeed shuffling the samples should not change the applied function. However, you are changing the order of operations (or rather samples in this case), which might show small errors due to the limited floating point precision order.
Here is a small example using only the sum operation:

x = torch.randn(100, 100)
s1 = x.sum()
s2 = x.sum(0).sum(0)
s3 = x[torch.randperm(x.size(0))].sum()
print(s1 - s2)
# tensor(-5.7220e-06)
print(s1 - s3)
# tensor(1.1444e-05)

All three approaches calculate the same desired sum but due to a change in the operation/sample order the outputs show small errors and you might see the same effect in your model.

1 Like