2 models ( with similar hyper parameters) loaded from same checkpoint giving different results during training

sarvani · May 31, 2021, 4:50am

COLAB LINK

I trained a model (Lenet-5) for 10 epochs and saved the model as follows
state = {
‘model’: model.state_dict(),
‘optimizer’: optimizer.state_dict(),
‘scheduler’: scheduler.state_dict()
}
th.save(state,’/Documents/saved_model.pt’)

loaded into 2 models ‘new_model’, ‘new_model2’ as below
new_model= model2().cuda()
new_model2= model2().cuda()
checkpoint = th.load(’/home/hareesh/Documents/sar/information_theory/lenet/saved_model.pt’)
checkpoint2 = th.load(’/home/hareesh/Documents/sar/information_theory/lenet/saved_model.pt’)

new_optimizer=th.optim.SGD(new_model.parameters(), lr=0.1,momentum=0.9,weight_decay=5e-4)
new_scheduler = MultiStepLR(new_optimizer,milestones=[10,20], gamma=0.1)

new_optimizer2= th.optim.SGD(new_model2.parameters(), lr=0.1,momentum=0.9,weight_decay=5e-4)
new_scheduler2 = MultiStepLR(new_optimizer2,milestones=[10,20], gamma=0.1)

new_model.load_state_dict(checkpoint[‘model’])
new_model2.load_state_dict(checkpoint[‘model’])

new_optimizer.load_state_dict(checkpoint[‘optimizer’])
new_optimizer2.load_state_dict(checkpoint[‘optimizer’])

new_scheduler.load_state_dict(checkpoint[‘scheduler’])
new_scheduler2.load_state_dict(checkpoint[‘scheduler’])

trained the new models for 5 epochs,but the results are different.
Also when i continue training the original model for 5 more epochs ,the results are also different from the training results of 2 new models.

Is it possible that the test and train accuracies of original model(15 epochs), 2 new models(5 epochs after loading from the checkpoint) will be same?

(From loaded checkpoint im getting same test accuracy for all 3 models, if not training any of the models)

Bilal · May 31, 2021, 4:58am

Didn’t go through the code but, loading a model into two different models doesn’t necessarily ends in the both having identical end result since the batches they’re trained on are different

sarvani · May 31, 2021, 6:25am

Thank you for the reply @Bilal.
Yes it seems true.

But for reproducibility I have used below code snippet

"seed =1234
random.seed(seed)
os.environ[‘PYTHONHASHSEED’] = str(seed)
th.manual_seed(seed)
th.cuda.manual_seed(seed)
th.cuda.manual_seed_all(seed)
th.backends.cudnn.deterministic = True "

which is giving me epoch wise same test and train accuracy for all 3 models every time I experiment.

SO, I think this would happen only if the training batches are loaded in similar way every time for original model and 2 new models.

Am I correct?
If it is correct same batches should be loading for each of the models every time right ?

Bilal · May 31, 2021, 2:25pm

Hi @sarvani ,

You might be right, but it all depends on the working of the random.seed() function call, maybe since you are training the model’s individually twice is why they’re receiving different but same batches throughout

sarvani · June 1, 2021, 1:35am

@Bilal I’m training each of them only once.
Just to check I tested saving ,loading and training of 2 new models in same file so that even random.seed() would be common for all models now.
Still the 2 new models and continuing the training of old model are giving different accuracies.