Loaded model doesn't train anymore

Damiano_Zappia · May 5, 2021, 8:29am

Hi,
I’m working on a code that fine tune Mask-RCNN model on a dataset, then save it, and load the model to integrate it in a GAN framework as generator, thus the Mask-RCNN model that is loaded, has to be trained further.
Anyway when I load the model and launch the training, it doesn’t update like if the parameters are freezed, how can it be possible?
I add a code to better understand:

# save
PATH = "./saved_models/generator"+str(epoch)
torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            }, PATH)

# load 
epoch = 0 # last epoch for which the model was saved

model = get_instance_segmentation_model(num_classes)
optimizer = optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

checkpoint = torch.load(PATH, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']


model.train()

Am I missing something? If I train the model without saving it and proceed to the GAN training phase, the model update works and I can see change in results, while If I do the same but with a loaded model, results don’t change.

JuanFMontesinos · May 5, 2021, 9:14pm

Can you try to define the optimizer after loading the state dict?
Not sure if the optimizer is pointing to the proper tensors.

Damiano_Zappia · May 8, 2021, 4:09pm

Hi @JuanFMontesinos and thanks for your answer.
Do you mean first load the state dict and then define the optimizer? I just tried but nothing.
Loaded model in training phase has zero gradient flow, basically it doesn’t train at all.

JuanFMontesinos · May 9, 2021, 7:13pm

My insight was that loading the state dict after defining the optimizer was making the optimizer to update “other” weights (the ones corresponding to the model before replacing them by the loaded ones).

If that’s not the case you will probably have some bug anywhere else.

Damiano_Zappia · May 9, 2021, 9:05pm

Yeah maybe you are right and the error is somewhere else in the code. Loading the state dict before defining the optimizer would not be possible if I’m not wrong, because calling .load_state_dict() on the optimizer needs that it was defined.

JuanFMontesinos · May 9, 2021, 10:28pm

Well I meant loading model’s state dict before passing its parameters to the optimizer.

Yyote · June 22, 2023, 10:16pm

This worked for me, thanks! My loading code goes like this:

# Load the network
model = NeuralNetwork().to(device)
# model.load_state_dict(torch.load(save_name))
checkpoint = torch.load(save_name)
model.load_state_dict(checkpoint['state_dict'])
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
optimizer.load_state_dict(checkpoint['optimizer'])
print("Model loaded!")

Arafat_Al_Zaman · July 1, 2024, 8:27pm

Hello @Yyote,

I am doing as the below:

model = self.model
model.load_state_dict(checkpoint['model_state_dict'])
optimizer = get_optimizer(model, lr=self.lr, lr_scale=self.lr_scale)
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=self.total_steps)
scheduler.load_state_dict(checkpoint['scheduler_state_dict'])

self.model = model
self.optimizer = optimizer
self.scheduler = scheduler

But still doesn’t work. Could you help me, please?