Just initializing (and not using) an auxiliary model varying the loss of primary model

Yashika · May 6, 2023, 7:06pm

I’m noticing that initializing any random model, and not using that random model anywhere further in the code, is causing variations in the loss and AUC as compared to not initializing that random model.

For instance, in the below code snippet, just initializing the unused auxiliary model model_aux is causing variations in the loss and AUC of model.

model = Model(n_heads, num_layers, device)
model_aux = Model_aux(n_heads_aux, num_layers_aux, device)

model = model.to(device)
# model_aux = model_aux.to(device)

params = list(model.parameters()) 
# params_aux = list(model_aux.parameters())

optimizer = torch.optim.Adam(params, lr=lr, weight_decay=weight_decay)  
# optimizer_aux = torch.optim.Adam(params_aux, lr=lr, weight_decay=weight_decay)

Has anyone else experienced this issue, and if so, do you have any insights as to why this might be happening?

srishti-git1110 · May 7, 2023, 11:33am

Hey,
This seems strange.
Did you set the seed correctly to ensure a fair comparison?

I would also think that the data loading order might also create such differences. That to say, in case you used a random sampler for your dataloader with shuffle=True rather than a custom sampler class, the samples included in the batches for each run shall be different and might be a potential reason for this difference.

Did you checked this difference multiple times and if yes, is this difference seen every time?

Yashika · May 7, 2023, 12:32pm

@srishti-git1110, Thanks for the reply!
Indeed, I had set the seed using the following function-

def seed_everything(seed: int):
    random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True

I tried setting shuffle=False, yet a significant difference persists.
Yes, I checked this multiple times and the difference is seen every time.
Also, the difference varies with altering the architecture of the auxiliary model.

srishti-git1110 · May 7, 2023, 2:43pm

I see.
Unfortunately, I cannot seem to understand why this must be happening and whether the behaviour is expected. Couldn’t find anything similar in GitHub issues as well.

We can maybe wait for someone from the community to provide some insight.
cc: @ptrblck

KFrank · May 7, 2023, 10:39pm

Hi Yashika!

Yashika:

just initializing the unused auxiliary model model_aux is causing variations in the loss and AUC of model.
model = Model(n_heads, num_layers, device)
model_aux = Model_aux(n_heads_aux, num_layers_aux, device)

Almost certainly you are using some sort of randomization in your training
and initializing model_aux is consuming some random numbers, putting
your training at a different point in the (pseudo)random-number stream.

I know you said that you’ve set shuffle = False, but there could be other
sources of randomization such as Dropout layers or random transformations
such as RandomCrop if you are using augmentation.

Try this experiment:

# to make runs repeatable
torch.manual_seed (12345)
model = Model(n_heads, num_layers, device)

# either do or don't instantiate Model_aux
model_aux = Model_aux(n_heads_aux, num_layers_aux, device)

# set random-number stream to the same well-defined state
# regardless of whether Model_aux was instantiated
torch.manual_seed (98765)

# run some training ...

First verify that your runs are repeatable when you perform multiple runs
without Model_aux.

Assuming this test passes, try running with Model_aux (making sure to
include the second call to torch.manual_seed (98765)) and see if you
continue to get the same results.

Best.

K. Frank

Yashika · May 8, 2023, 8:47pm

@KFrank, Thank you very much for the help!
Setting the random seed again after instantiating Model_aux has resolved the issue.
The loss and AUC score now remain the same regardless of Model_aux 's initialization.