Same loss/mae across run in trasnformer-like model

Hello!

I am running a transformer model for traffic prediction (my own implementation of https://arxiv.org/pdf/2202.03539v1.pdf) - The code is here GitHub - radandreicristian/adn: A PyTorch implementation of the Attention Diffusion Network from "Structured Time Series Prediction without Structural Prior"..

Without setting any seed, I am getting the exact same loss values across multiple runs (at epoch level). The model converges to the expected values and works fine, but what worries me is how identic the losses/metrics are. Is this normal in such models? Is it related to the initialization schemes (Xavier uniform) + strong regularisation (dropout, layer norm, etc.), or is there some (hidden) seeding happening that gives my code this incredible reproducibility across runs?

Thanks :smiley:

I would not expect to see bitwise identical results without any seeding. As a quick test, you could re-seed the code with different seeds and see if the values change at all.

I tried setting a random seed at each run. My driver code is something like this ref for seed_everything

from pytorch_lightning import seed_everything

for i in range(10):
    seed_everything() # with no params, I should get a random seed at each call. This in turn calls torch.manual_seed(...)
    experiment = Experiment() # construct my driver class, init model, etc.
    experiment.run() # Train + test loops

It’s bitwise identic across 7 runs. Am I missing something? The train data loader uses shuffling, so at least that should give me some variance at epoch-level loss, right?

Yes, it seems as if seeds are set inside your class somewhere.
As another quick test, just print(torch.randn(10)) inside the loop before running the experiment and check if you would get at least different random values there.

Great idea, thanks. I ended up doing something like

from pytorch_lightning import seed_everything

for i in range(10):
    seed_everything() # with no params, I should get a random seed at each call. This in turn calls torch.manual_seed(...)
    print("Before experiment", torch.randn((2, 2))
    experiment = Experiment() # construct my driver class, init model, etc.
    experiment.run() # Train + test loops
    print("After experiment", torch.randn((2, 2))

The second print always outputs the same values. Indeed, it seems like something is setting a seed inside my experiment.

Turns out that one of my dependencies (ClearML) was overwriting to a fixed seed without my awareness.

Good to hear you’ve narrowed it down and a bit scary to hear that other packages are re-seeding the code for you without your knowledge.

I figured out how they do it and where the setting is, but I wasn’t expecting the default behaviour to be the same seed/reproducible behaviour.