Resuming learning with a lower learning rate

JAllsopp · September 19, 2024, 4:24pm

Hi, I’m training a Resnet50 model to do regression. It’s worked really well and I’m nearly at the validation loss I need for the technique. However, everything’s starting to flatten out with a learning rate of 1e-4 and I’d like to reduce it. This is set in the code like this;

class ResNetRegression(L.LightningModule):
    def __init__(self, config, max_ions:int):
        super().__init__()
        self.save_hyperparameters() # this saves all the aguments passed into __init__ into checkpoint, so rather critical!                                                                                                                                                                                                                                                                                                                                                                                                                                     
        self.max_ions=max_ions
        self.learning_rate = config['learning_rate']
        self.resnet_model=models.resnet50(pretrained=False)
        linear_size = list(self.resnet_model.children())[-1].in_features
        # replace final layer for fine tuning                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
        self.resnet_model.fc = nn.Linear(linear_size, 1)
        self.resnet_model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)


    def forward(self, x):
        x = self.resnet_model(x)
        return x

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        return optimizer

I’m training like this;

data_module = ResNetDataModule(config, num_workers, rotation_degrees, data_dir, training_url, validation_url)
model = ResNetRegression(config, max_ions)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    

val_every_n_epochs=1
max_epochs = 100

checkpoint_callback = L.pytorch.callbacks.ModelCheckpoint(                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
    filename="fa_classifier_{epoch:02d}",
    every_n_epochs=val_every_n_epochs,
    save_top_k=-1,  # <--- this is important!                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
)

trainer  = L.Trainer(
    devices="1",
    accelerator="gpu",
    callbacks=[checkpoint_callback],
    check_val_every_n_epoch=val_every_n_epochs,
    max_epochs=max_epochs,
    default_root_dir="Checkpoints",
    enable_progress_bar=True
)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
trainer.fit(model,datamodule=data_module)

Can I just replace the model instantiation with;

model = ResNetRegression.load_from_checkpoint(“path to checkpoint file”)
model.learning_rate = 1e-5

then restart the learning using trainer.fit(…)

I know this is probably a really obvious question, but a definitive answer would be incredibly helpful.

Thanks

Chandler_Kenworthy · September 19, 2024, 8:41pm

Model weights are updated during training so if you save the models state dictionary then load it back in later you have your “pre-trained” model.

There is no issue in then continuing training with a different learning rate (or optimiser or loss function for that matter!).

You may want to have a look at learning rate scheduling though.