I ran training and saved loss value and weights of network. Then I again ran training and saved loss value and weights of network. It turned out, that I got the same loss value on first batch iteration, but my weights are different already after first batch iteration (i.e one optimization step was)… How could it happen? I set the same seed, but it doesn’t matter this, I suppose.

Besides setting the random seed, you should also disable non-deterministic cuDNN operations, if you are using the GPU.

Have a look at the docs regarding reproducibility.

Note that you might lose some performance enabling deterministic behavior.

Yes, I do it, but it didn’t help me, moreover, it doesn’t matter, because loss_tensor’s are the same in different running… But weights are different after first optimization…

That sounds quite strange, as I would assume the same loss is generated using the same data and model parameters. So the gradients / weight updates seem to differ somehow?

Could you post a code snippet reproducing this behavior?