Initialising optimizer twice and loading its state dictionary is giving good accuracy

I am trying to run code for lottery-ticket hypothesis, where first I have a model initialised with name model_pre which I will train for 30 epochs with linear increase in warmup and then save the model state dict and opt state dict.

Next step I start the pruning the models first with 100% weights and then so on. I then initialise same model as mode_pre with name model and load the model_pre weights and then train model with 100% weights and also the same optimiser after the warmup before pruning. This seems to give acc around 87% but it is expected to give around 93%.

When I do the exact same steps with initialising extra optimiser step with the model after warmup(model_pre) and then loading the optimizer with optimizer stated after the warmup , I get an acc around 93%. Its the same thing I am doing in bothways, I dont know why that happens. Can any one point me in the right direction.

opt_class, opt_kwargs = load.optimizer(args.optimizer)
    optimizer = opt_class(generator.parameters(model_pre), lr=0.1, weight_decay=1e-4,momentum = 0.9, **opt_kwargs)
    #scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[20,40], gamma=10)
    #scheduler= torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max = args.pre_epochs)
    #optimizer = optim.SGD(model_pre.parameters(), lr=0.1, weight_decay = 1e-4, momentum=0.9)
    full_train(model_pre, loss, optimizer,train_loader, test_loader, device, args.warmepochs, args.verbose),"{}/".format(args.result_dir)),"{}/".format(args.result_dir)),"{}/".format(args.result_dir))

    model = load.model(args.model, args.model_class)(input_shape, 
    #optimizer = opt_class(generator.parameters(model), lr=0.1, weight_decay=5e-4,momentum = 0.9, **opt_kwargs)
    #scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[20,40], gamma=10)

model.load_state_dict(torch.load("{}/".format(args.result_dir), map_location=device))
    optimizer = opt_class(generator.parameters(model), lr=0.1, weight_decay=1e-4,momentum = 0.9, **opt_kwargs) 
    #optimizer    = optimizer  
    optimizer.load_state_dict(torch.load("{}/".format(args.result_dir), map_location=device))

In the second code snippet , in the second line where I initialise the opt again works well, but If I comment that line and run the model its worse.

Can’t be too sure, but I suspect that when you load the model from the saved checkpoint, it somehow overwrites your parameters, and then the existing optimizer instance has a different set of parameters, and when you backprop into those with that older optimizer instance, it’s sending the gradients into parameters that don’t belong to the loader model instance. When you re-initialize the optimizer, you’re re-initializing with the current set of model parameters, and learning is then having its desired effect.

Please could you check by seeing what optimizer.param_groups holds and if those are the same tensors/parameters as those in the loaded model.

Hello , I checked the optimizer.param_groups for both before and after , they both seem to have same tensors and as well I checked the momentum buffer in optimiser after and before. They were same too