Does PyTorch change its internal seed during training?

I am trying to make my training code as deterministic and reproducible as possible. When running the same training code multiple times, and always re-initialising the model, I get different results - even if I set the seeds manually, before all runs start. I found that when I reset the seed on every training run, all runs do end up with the same result. This seems to indicate that torch (or numpy or Python’s random) internally does change its seed.

def set_seed():
    torch.manual_seed(3)
    torch.cuda.manual_seed_all(3)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(3)
    random.seed(3)
    os.environ['PYTHONHASHSEED'] = str(3)

for i in range(10):
    set_seed()
    model = init_model()
    model.train()
    model.test()

The code above always produces the same result for the test set, as expected. But when I only set the seed once, i.e. outside the loop, results vary over the iterations. What causes this?

1 Like

uh… random generation is a sequential process so every time you generate a new random number the seed changes. Thus before every training begins you need to reset the seed, i.e. put set_seed() in the loop.

1 Like

I’m not sure we understand each other. Let’s say that I put set_seed(5) outside the range loop, I would expect all ten runs to have the same result. This is not the case. When I put set_seed(5) inside the range loop, it does work as expected. Does that mean the seed changes over time?

I did find that manual seed returns a generator, which does explain this behaviour.

As @ybj14 said, the pseudo-random number generator uses the seed as its initial seed and generates all sequential numbers based on this initial seed.
That doesn’t mean that every “random” number will have the exact same value (which would create a useless random number generator), but that the sequence of random numbers are the same.
Have a look at this example:

torch.manual_seed(2809)
print(torch.randn(2))
print(torch.randn(2))
print(torch.randn(2))

torch.manual_seed(2809)
print(torch.randn(2))
print(torch.randn(2))
print(torch.randn(2))

As you can see, torch.randn will yield new random numbers if sequentially called. After resetting the seed, you’ll get the same sequence of random numbers.

In your case, the model initialization can be seen as a call to the random number generator, which will yield different results. If you want your model to have exactly the same parameters, set the seed before initialization or reload the state_dict of a reference model.

4 Likes

Thank you, that does make sense indeed!

Hello everybody!
I did as above and had the problem

Code:
def seed_torch(seed=0):
random.seed(seed)#
os.environ[‘PYTHONHASHSEED’] = str(seed)#
np.random.seed(seed)#
torch.manual_seed(seed)#
torch.cuda.manual_seed(seed)#
torch.cuda.manual_seed_all(seed)# # if you are using multi-GPU.
torch.backends.cudnn.benchmark = False#
torch.backends.cudnn.deterministic = True#

seed_torch()
net_basnet.train()
for i, data in enumerate(train_dataloader):
seed_torch()

seed_torch()
net_basnet.eval()
seed_torch()
with torch.no_grad():
for i, data in enumerate(val_dataloader):
seed_torch()

I have the same results of training loss. However, I have different results of validation loss, even putting seed_torch() everywhere in loops. Please help me solve this problem!!!. Thank you!

Hi @ptrblck ,

“the pseudo-random number generator uses the seed as its initial seed and generates all sequential numbers based on this initial seed.”--------------------- Is it possible to print these sequential numbers in python somehow?

You can print these values directly, e.g. via:

torch.manual_seed(2809)
for _ in range(10):
    print(torch.randn(1))

After re-seeding you would see the same values again.

Actually, my code is pretty big and I can’t run a for loop like this. I just wanted to print the value of RNG at some particular point in my code.

I’m not sure I understand what “value of RNG” is.
The pseudorandom number generator can be seeded and will then output defined pseudorandom values.
If you want to check which value would be sampled in the 1000th iteration from torch.randn, you could just call this method.

1 Like