Determinism in pytorch across multiple files

Megh_Bhalerao · July 10, 2022, 2:35am

Hi,

Suppose I have the following python files main.py, utils.py, trainer.py. My overall objective is to get deterministic behavior across all files.

My entry point to the code is main.py. main.py calls a function which is located in utils.py set_random_seeds(seed). The function is as follows -

def set_random_seed(seed: int) -> None:
    """
    Sets the seeds at a certain value.
    :param seed: the value to be set
    """
    print("Setting seeds ...... \n")
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic=  True

Now, main.py also calls a function train_code() from train.py after the set_random_seed() function.

I have a basic question - does setting the random seeds in one of the files guarantee that torch, numpy, random or any other package uses the same seed? If yes, how does python know that the same seed is to be used in all the different files? Is this the job of the linker or some related thing?

Thank you!

ptrblck · July 10, 2022, 7:22pm

Your script will be executed in a main Python process, which would load the libraries, set the seeds etc. As long as this main process calls all methods, the seeds will be kept and the state of the script is consistent.
If you use multiprocessing, you might need to re-seed the code as the behavior of forking or spawning new processes could be different between libraries. E.g. take a look at this issue which describes numpy’s behavior.