Slightly different results in same machine and GPU but different order

cgr71ii · February 27, 2023, 11:57am

Hi!

I’ve noticed that when I provide different order of my data at inference-time, the results are slightly different. I’ve put model.eval(), and everything is fine if the order is the same, but I tried to shuf since I was observing different results in the flask server I was using, then I noticed this problem. Is this expected?

Example:

cat data | wc -l # 300.000

for i in $(seq 1 100); do
  cat data | inference | md5sum # Always same MD5
done

cat data | shuf | inference | md5sum # Different MD5

comm -3 <(cat data | inference | sort) <(cat data | shuf | inference | sort) | wc -l # 6 -> 3 different results

I’ve checked out similar posts like:
https://discuss.pytorch.org/t/slightly-different-results-when-evaluating-same-model-on-different-machines
https://discuss.pytorch.org/t/slightly-different-results-on-k-40-v-s-titan-x

but they don’t describe exactly the same situation.

GPU: GeForce RTX 2080 Ti
NVIDIA driver version: 525.85.12
CUDA: 12.0

Thank you!

matt.carlson · April 7, 2023, 3:08am

Which type of model architecture are you using? I have seen a lot of transformers that take in a seed as a parameter suggesting that there are variable results at inference time. I don’t have a concrete answer though, just an observation.

cgr71ii · April 10, 2023, 7:31am

XLM-RoBERTa from HuggingFace. But when I provide the inference data in the same order multiple times, as can be observed in the pseudo execution I provided, the result is the same. The thing is that when the order is different, the results are slightly different. Anyway, do you think this could be related to the architecture?

matt.carlson · May 26, 2023, 2:44am

I do. Some (most?) models are non-deterministic. if you run the inference twice through the loop without instantiating a new model are they identical?