Continue trainning after saving model

How can I save best model weights to continue training the model after stopping because of limited GPU resources?

The ImageNet example would be a good reference for resuming the training.
E.g. here a checkpoint is loaded and the training is resumed while here the checkpoint giving the best validation accuracy is stored.

I am working with a GAN model, so I don’t calculate the accuracy, should I save the best validation G_loss and D_loss?
I am using the below code during training, is it correct:

if  min_valid_loss_g > valid_loss_g :
     print(f'G_Val_Loss_Decreased({min_valid_loss_g:.6f}--->{valid_loss_g:.6f})\t Saving The Model')
     min_valid_loss_g = valid_loss_g
     torch.save({
            'epoch': epoch,
            'G_state_dict': G.state_dict(),
            'G_optimizer_state_dict': optimizer_G.state_dict()
            'G_loss': valid_loss_g
            },f"./generator-epoch-{epoch}.pth")

Yes, your approach sounds reasonable assuming the validation loss properly represents the training progress of your model.

How can I assume that the validation loss properly represents the training progress of my model?

The below code, is how I am uploading the saved models to continue training after restart the Kernel, is it correct?

G = Generator().to(device)
checkpoint = torch.load('generator.pth')
try:
    checkpoint.eval()
except AttributeError as error:
    print (error)


G.load_state_dict(checkpoint['G_state_dict'])
optimizer_G.load_state_dict(checkpoint['optimizer_state_dict_G'])
epoch = checkpoint['epoch']
loss = checkpoint['valid_loss_g']
G.eval()

should I continue training on the same training dataset or should I modify it?

The choice of the dataset depends on your use case. If you want to “continue” the training, then using the same dataset would work; if you want to fine-tune the model on another dataset then you would need to change it.

and how to load the saved models?

Using this technique, during the training I got the same loss for different epochs.
Where is the problem?

In your code snippet you are already loading the state_dicts in:

G.load_state_dict(checkpoint['G_state_dict'])
optimizer_G.load_state_dict(checkpoint['optimizer_state_dict_G'])

Check that gradients are calculated for each used parameter after the first backward pass. If some .grad attributes are set to None, your computation graph is detached. If that’s not the case, try to overfit a small dataset by playing around with hyperparameters.

How can I check those gradients?

You could iterate the parameters and print their .grad attribute:

loss.backward()
for name, param in model.named_parameters():
    print('{}, {}'.format(name, param.grad))