Slightly different results in same machine and GPU but different order

Hi!

I’ve noticed that when I provide different order of my data at inference-time, the results are slightly different. I’ve put model.eval(), and everything is fine if the order is the same, but I tried to shuf since I was observing different results in the flask server I was using, then I noticed this problem. Is this expected?

Example:

cat data | wc -l # 300.000

for i in $(seq 1 100); do
  cat data | inference | md5sum # Always same MD5
done

cat data | shuf | inference | md5sum # Different MD5

comm -3 <(cat data | inference | sort) <(cat data | shuf | inference | sort) | wc -l # 6 -> 3 different results

I’ve checked out similar posts like:
https://discuss.pytorch.org/t/slightly-different-results-when-evaluating-same-model-on-different-machines
https://discuss.pytorch.org/t/slightly-different-results-on-k-40-v-s-titan-x

but they don’t describe exactly the same situation.

GPU: GeForce RTX 2080 Ti
NVIDIA driver version: 525.85.12
CUDA: 12.0

Thank you!