Use the same data, but get different loss

ptrblck · July 15, 2021, 5:48am

If the model uses dropout (or other layers with random behavior) I would expect to see a larger difference in the output, so I assume you are running into the expected limited floating point precision due to a different order of operations as seen e.g. here:

x = torch.randn(100, 100, 100)
y1 = x.sum()
y2 = x.sum(0).sum(0).sum(0)
print(y1 - y2)
> tensor(6.1035e-05)