Reproducibility over Different Machines

kko · December 4, 2019, 5:56pm

I read all the posts regarding reproducibility including https://pytorch.org/docs/stable/notes/randomness.html. Here are my experiences so far:

calling the below code segments makes sure that I got consistent results within the same machine.
seed = 1234
random.seed(seed)
os.environ[‘PYTHONHASHSEED’] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.enabled = False
My goal is to get consistent results over different machines. For this, I made sure that I have the same python, numpy, pytorch, torchvision versions etc. for both machines. Still, results are very different for two machines.
To point out which part of the code results in inconsistency, I removed the parts of the code one by one
i- One reason was dropout. When I change to in_replace dropout, this problem solved. I.e. dropout gives consistent results.
ii- The only remeaning part that leads to inconsistent results is optimizer.step(). However, I could not find any solution to solve this inconsistency.

I am not sure if loss.backward() or optimizer.step() functions use CUDA functions mentioned in https://pytorch.org/docs/stable/notes/randomness.html. If so, can anyone suggest a possible solution for inconsistencies due to optimizer.step()?

This is very important for me. Even if there is no absolute solution to this problem, I like to try different options.

albanD · December 4, 2019, 6:46pm

Hi,

The problem is that across different machines, the hardware (for example different GPU cards) or different library version (cudnn adding/deleting algorithms for example) can give different results. So there is little we can do to ensure the same results

kko · December 4, 2019, 6:58pm

Thanks! I am aware of this possibility. But, when I remove optimizer.step() everything exactly same even though I have dataloader shuffle, dropout etc. So, maybe there is something I can try also for optimizer.step()?

albanD · December 4, 2019, 7:04pm

You might be able to track it down for this model and this pair of machines.
But then this most likely won’t work if you try on other machines/other environments.

Now if you really want to reproduce between these two machines, which optimizer are you using?

kko · December 4, 2019, 7:28pm

Thanks for the response!

I am using Adam optimizer, but I also tried with SGD and it did not help.

albanD · December 4, 2019, 7:37pm

I’m afraid you’ll have to check each op in the optimizer and see which one differs. You might want to use sgd as it is much simpler.

kko · December 4, 2019, 7:41pm

Thanks. I will try your suggestion once I find time. It has been big pain to get the same results

Maybe, I should first check gradient values? Possibly, inconsistency starts there.

albanD · December 4, 2019, 8:39pm

Ho yes, very likely the gradients are different as well.
“It has been big pain to get the same results” Yes unfortunately the floating point standard makes it very hard as it does not force a given value of a result, but just forces values to be close enough to them.

kko · December 4, 2019, 8:41pm

Thanks, I was hopeful as I was getting exact same network outputs when commenting out optimizer.step().

If I get different grads for simple loss function like cross entropy, is there any way to solve that problem?

albanD · December 4, 2019, 8:47pm

As I said above. I don’t think there is any way to do that.
Even a.sum() depending on your GPU will give you different results :’(
So it’s going to be very hard in any case.

barakb · April 6, 2020, 8:46am

@albanD Hi, I’m having the same issue, 2 machines, 1 give an accuracy of x+1 the othe x-0.5, where x is (the baseline), I did 3 experiments on each machine.
The machines run the same commands, all the stochastic elements disabled, and the exact same environment. except Titan V vs Titan XP.
The thing you said was sharp “different HW will give with different results” .
It’s sounds like reproducible is almost unreachable, isn’t there anything to do with that?

Further more even with the same machines I still get big gaps (although not large as using different machines.)

albanD · April 6, 2020, 2:01pm

It’s sounds like reproducible is almost unreachable, isn’t there anything to do with that?

I’m afraid no. This is a hardware/floating point limitation. The floating point standard specifies only how close from the real value the result should be. But the hardware can return any value that is that close. So different hardware can give different result

sinemuysal · March 31, 2021, 7:14pm

Hi,

Just want to follow-up on this issue. Is there still no solution for this problem? I am experiencing the same issue. Same seed on different machines result in different gradients and different results. This is really problematic for model tuning and selection. Any suggestions would be really appreciated.

albanD · April 5, 2021, 1:14pm

As mentioned above, this is not a problem we can fix. It is a hardware/floating point standard issue.
If you need more precision, you will have to use double precision floats (which won’t be perfect either) at the cost of speed.