Nondeterministic behaviour of grad? (running on CPU)

I’ve heard that running on GPU can give nondeterministic results, but is this expected to happen on CPU?

At the beginning of the code, I’ve called torch.manual_seed(1234). I’ve also seeded the numpy and python random number generators just to be safe. I have a snippet in my code (inside a loop):

loss = self.loss_function(out, y)
dl_dold_state, dl_dtheta_direct = grad(loss, (state_vec, self.theta), retain_graph=True)

And I have verified that even when all input variables (loss, out, y, state_vec, self.theta) match their values from the last run of the code (at the same loop iteration), dl_dtheta_direct can output a slightly different value (the error is on the order of 1e-9).

I’m running on a laptop without a GPU, so this code is definitely running on CPU. It may not seem like a big deal, but I’m getting unexpected behaviour in my code and if there is some possiblility of a bug in pytorch’s grad operation it opens up the possibility that the bug is not in my code but in pytorch.

Anyone had a similar problem or know how to resolve it?

if you use CPU, OpenMP multi-threading introduces non-determinism for some functions.

You can disable this and run with a single thread, that will run fully deterministcally (on CPU):

OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 python mycode.py

Thanks for the reply.

But this still doesn’t seem to fix it for me. Are there any other possible sources of randomness in torch that I could try before I try to come up with a minimal example to reproduce the error?

except for manual_seed and threads, i cant think of much else. If you have any representative script that can reproduce the non-determinism, i’ll take a look.