Backward() on the same loss value gives different weights of network

Oktai15 · November 7, 2018, 3:46pm

I ran training and saved loss value and weights of network. Then I again ran training and saved loss value and weights of network. It turned out, that I got the same loss value on first batch iteration, but my weights are different already after first batch iteration (i.e one optimization step was)… How could it happen? I set the same seed, but it doesn’t matter this, I suppose.

ptrblck · November 7, 2018, 3:57pm

Besides setting the random seed, you should also disable non-deterministic cuDNN operations, if you are using the GPU.
Have a look at the docs regarding reproducibility.
Note that you might lose some performance enabling deterministic behavior.

Oktai15 · November 7, 2018, 3:59pm

Yes, I do it, but it didn’t help me, moreover, it doesn’t matter, because loss_tensor’s are the same in different running… But weights are different after first optimization…

ptrblck · November 7, 2018, 4:01pm

That sounds quite strange, as I would assume the same loss is generated using the same data and model parameters. So the gradients / weight updates seem to differ somehow?
Could you post a code snippet reproducing this behavior?