If the model uses dropout (or other layers with random behavior) I would expect to see a larger difference in the output, so I assume you are running into the expected limited floating point precision due to a different order of operations as seen e.g. here:
x = torch.randn(100, 100, 100)
y1 = x.sum()
y2 = x.sum(0).sum(0).sum(0)
print(y1 - y2)
> tensor(6.1035e-05)