Strange behavior of mean on CPU

I was doing some test and I came across a strange behavior/“bug” ?
Here is the repository to reproduce, https://github.com/czotti/pytorch_mean_test

I get different results when I apply the mean function on a tensor.

> features.shape
 torch.Size([5, 15, 64000])

> features[0].mean(dim=-1).numpy()
 [0.45403323 0.08670517 0.02846369 0.02786237 0.02710164 0.02582995
 0.02611523 0.0251685  0.02441121 0.02369287 0.02208564 0.02194764
 0.02004337 0.03743855 0.14910352]

> features[0].mean(dim=-1).numpy()[0]
 0.454033225774765

> features[0, 0].mean(dim=-1).numpy()
 0.4540286958217621

On GPU it’s consistent I get the same output (as you can check on the repository) for the last two operation. I expect the same behavior for the CPU. Did I miss something ?

avx, kahan summation, openmp are possible explanations