Not quite deterministic behavior after seeding

CDhere · February 24, 2021, 10:07am

I am using seeds to control the reproducibility of my experiments; specifically, I am using:

np.random.seed(seed)
torch.manual_seed(seed)
random.seed(seed)

However, the results each round are not completely the same: the losses I get are almost the same (within 1e-4 precision) but not quite; the classification accuracies are also slightly off by about 1e-1 each time. For example:

Epoch 1 Batch 100/782, loss: 4.536677837371826
Epoch 1 Batch 200/782, loss: 4.429199695587158
Epoch 1 Batch 300/782, loss: 4.34957218170166
Epoch 1 Batch 400/782, loss: 4.270653247833252
Epoch 1 Batch 500/782, loss: 4.134552478790283
Epoch 1 Batch 600/782, loss: 4.002155303955078
Epoch 1 Batch 700/782, loss: 3.957610607147217
Start evaluating...
eval step: 0
eval step: 100
Finished running on val
After Epoch 1, top 1 val acc: 37.74
After Epoch 1, top 5 val acc: 68.59

In one round and:

Epoch 1 Batch 100/782, loss: 4.536672115325928                                                                                                                                             
Epoch 1 Batch 200/782, loss: 4.4291839599609375                                                                                                                                            
Epoch 1 Batch 300/782, loss: 4.3493218421936035                                                                                                                                            
Epoch 1 Batch 400/782, loss: 4.270333290100098                                                                                                                                             
Epoch 1 Batch 500/782, loss: 4.133965969085693                                                                                                                                             
Epoch 1 Batch 600/782, loss: 4.00153112411499                                                                                                                                              
Epoch 1 Batch 700/782, loss: 3.9571311473846436                                                                                                                                            
Start evaluating...                                                                                                                                                                        
eval step: 0                                                                                                                                                                               
eval step: 100                                                                                                                                                                             
Finished running on val                                                                                                                                                                    
After Epoch 1, top 1 val acc: 37.9                                                                                                                                                         
After Epoch 1, top 5 val acc: 68.88

In the other. I’m simply fine-tuning a classification network on CIFAR100 which is very straightforward so I don’t think I’ve left any other bugs that could lead to this issue.

Is this possible to have slightly off results even after setting a fixed seed? Are those seeds that I used enough? Thanks!

mmisiur · February 24, 2021, 1:13pm

I strongly advise you to read Pytorch reproducibility

In my case what helped was not only setting torch and numpy seed but also setting cudnn.benchmark to false and torch.set_deterministic:

torch.manual_seed(seed)
np.random.seed(seed)
torch.backends.cudnn.benchmark = False
torch.set_deterministic(True)

But there is a chance that it will be not enough. In that case follow Pytorch reproducibility

Hope that helps!

CDhere · February 26, 2021, 6:31am

Thank you for the reference! Yes this is super helpful. It worked for me to add these two lines:

torch.backends.cudnn.benchmark = False
torch.set_deterministic(True)

So just to summarize, there are several sources of randomness when using Pytorch:

Application-side randomness: this happens at our application level and includes: parameter initialization, data loader sampling, augmentation, and any other sampling wrote by ourselves in our own algorithms. Most often we are controlling this randomness.

np.random.seed(seed)
torch.manual_seed(seed)
random.seed(seed)

take care of this aspect.

Note: To go one step further, in DDP the DistributedSampler (which is used to split the batch across processes) uses a separate generator with its own seed to do random sampling at each epoch. It is written in a way that we (developers) usually don’t have to touch the seed. So, rest assured without tweaking the sampler!

Benchmarking randomness: this is brought by the fact a CUDA convolution operation usually has several implementations, and each might result in slightly different values due to the approximation they might use. So if benchmark option is True, our application might end up choosing different implementations across runs according to the state of our machine at that time, which leads to possible discrepancy.

To disable this part of randomness we need this line:

torch.backends.cudnn.benchmark = False

CUDA algorithmic randomness: even if we have fixed our choice of CUDA implementation of the operations, the implementation itself might have randomness. I’m not familiar with the exact algorithm but it seems some algorithms would rely on certain sampling strategies or approximation, somewhat like the application-level randomness but we will not be able to touch it. This part includes:
- CUDA convolution randomness that happens in CUDA convolutions, and can be disabled by setting torch.backends.cudnn.deterministic = True; and
- other operations’ randomness such as torch.bmm() (full list here), and can be disabled by calling torch.set_deterministic(True). However, this also takes care of the CUDA convolution randomness, so it is enough to use this line to turn off all CUDA algorithmic randomness.

To summarize, we would need the following lines to make completely reproducible (not across Pytorch versions & platforms, only our own runs on fixed machines) experiments:

# Application-side randomness
np.random.seed(seed)
torch.manual_seed(seed)
random.seed(seed)

# Benchmarking randomness
torch.backends.cudnn.benchmark = False

# CUDA algorithmic randomness
torch.set_deterministic(True)

I still want to make two notes here in case people get confused:

torch.set_deterministic(True) shuts off algorithmic randomness for all CUDA operations, whereas torch.backends.cudnn.deterministic = True only shuts off the CUDA convolution randomness. So it is enough to just use the former one.
torch.backends.cudnn.deterministic and torch.backends.cudnn.benchmark mean totally different things:
- ...deterministic = True disables the CUDA convolution randomness mentioned above, whereas
- ...benchmark = False disables the algorithm selection.

And, as @mmisiur suggested, the Reproducibility Doc is always a good reference.