Nondeterministic behaviour of grad? (running on CPU)

petered · September 14, 2017, 3:56pm

I’ve heard that running on GPU can give nondeterministic results, but is this expected to happen on CPU?

At the beginning of the code, I’ve called torch.manual_seed(1234). I’ve also seeded the numpy and python random number generators just to be safe. I have a snippet in my code (inside a loop):

loss = self.loss_function(out, y)
dl_dold_state, dl_dtheta_direct = grad(loss, (state_vec, self.theta), retain_graph=True)

And I have verified that even when all input variables (loss, out, y, state_vec, self.theta) match their values from the last run of the code (at the same loop iteration), dl_dtheta_direct can output a slightly different value (the error is on the order of 1e-9).

I’m running on a laptop without a GPU, so this code is definitely running on CPU. It may not seem like a big deal, but I’m getting unexpected behaviour in my code and if there is some possiblility of a bug in pytorch’s grad operation it opens up the possibility that the bug is not in my code but in pytorch.

Anyone had a similar problem or know how to resolve it?

smth · September 15, 2017, 12:52am

if you use CPU, OpenMP multi-threading introduces non-determinism for some functions.

You can disable this and run with a single thread, that will run fully deterministcally (on CPU):

OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 python mycode.py

petered · September 15, 2017, 9:50am

Thanks for the reply.

But this still doesn’t seem to fix it for me. Are there any other possible sources of randomness in torch that I could try before I try to come up with a minimal example to reproduce the error?

smth · September 15, 2017, 3:15pm

except for manual_seed and threads, i cant think of much else. If you have any representative script that can reproduce the non-determinism, i’ll take a look.