Saving for training later

(Paulo Mann) #1

I see that in the tutorials, from PyTorch website, we have a way to save the entire model along with the loss and optimizer parameters. That being said, I don’t understand what is the “loss” in this code below (from PyTorch website - https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-a-general-checkpoint-for-inference-and-or-resuming-training) :

model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()

Is it the final value of the loss before stopping training? Or is it the entire loss history (aka, a list of losses per epoch)?

I also have a StepRL, can its state be saved like the others? Since I need to recover the lr decay.

Thanks!

(Alex Veuthey) #2

The loss in the tutorial corresponds to the loss value when it’s saved. It’s not mandatory to save it when you want to resume training, but it can be useful to have (if you want to check consistency for example…)

With step LR, you can give the last epoch as an argument when constructing it. This will update the scheduling accordingly, but you need to save the last step with the checkpoint.

(Paulo Mann) #3

I see! Thanks @alex.veuthey