Slightly different results in same machine and GPU but different order

Hi!

I’ve noticed that when I provide different order of my data at inference-time, the results are slightly different. I’ve put model.eval(), and everything is fine if the order is the same, but I tried to shuf since I was observing different results in the flask server I was using, then I noticed this problem. Is this expected?

Example:

cat data | wc -l # 300.000

for i in $(seq 1 100); do
  cat data | inference | md5sum # Always same MD5
done

cat data | shuf | inference | md5sum # Different MD5

comm -3 <(cat data | inference | sort) <(cat data | shuf | inference | sort) | wc -l # 6 -> 3 different results

I’ve checked out similar posts like:
https://discuss.pytorch.org/t/slightly-different-results-when-evaluating-same-model-on-different-machines
https://discuss.pytorch.org/t/slightly-different-results-on-k-40-v-s-titan-x

but they don’t describe exactly the same situation.

GPU: GeForce RTX 2080 Ti
NVIDIA driver version: 525.85.12
CUDA: 12.0

Thank you!

Which type of model architecture are you using? I have seen a lot of transformers that take in a seed as a parameter suggesting that there are variable results at inference time. I don’t have a concrete answer though, just an observation.

XLM-RoBERTa from HuggingFace. But when I provide the inference data in the same order multiple times, as can be observed in the pseudo execution I provided, the result is the same. The thing is that when the order is different, the results are slightly different. Anyway, do you think this could be related to the architecture?

I do. Some (most?) models are non-deterministic. if you run the inference twice through the loop without instantiating a new model are they identical?