How can I save best model weights to continue training the model after stopping because of limited GPU resources?
The ImageNet example would be a good reference for resuming the training.
E.g. here a checkpoint is loaded and the training is resumed while here the checkpoint giving the best validation accuracy is stored.
I am working with a GAN model, so I don’t calculate the accuracy, should I save the best validation G_loss and D_loss?
I am using the below code during training, is it correct:
if min_valid_loss_g > valid_loss_g :
print(f'G_Val_Loss_Decreased({min_valid_loss_g:.6f}--->{valid_loss_g:.6f})\t Saving The Model')
min_valid_loss_g = valid_loss_g
torch.save({
'epoch': epoch,
'G_state_dict': G.state_dict(),
'G_optimizer_state_dict': optimizer_G.state_dict()
'G_loss': valid_loss_g
},f"./generator-epoch-{epoch}.pth")
Yes, your approach sounds reasonable assuming the validation loss properly represents the training progress of your model.
How can I assume that the validation loss properly represents the training progress of my model?
The below code, is how I am uploading the saved models to continue training after restart the Kernel, is it correct?
G = Generator().to(device)
checkpoint = torch.load('generator.pth')
try:
checkpoint.eval()
except AttributeError as error:
print (error)
G.load_state_dict(checkpoint['G_state_dict'])
optimizer_G.load_state_dict(checkpoint['optimizer_state_dict_G'])
epoch = checkpoint['epoch']
loss = checkpoint['valid_loss_g']
G.eval()
should I continue training on the same training dataset or should I modify it?
The choice of the dataset depends on your use case. If you want to “continue” the training, then using the same dataset would work; if you want to fine-tune the model on another dataset then you would need to change it.
and how to load the saved models?
Using this technique, during the training I got the same loss for different epochs.
Where is the problem?
In your code snippet you are already loading the state_dict
s in:
G.load_state_dict(checkpoint['G_state_dict'])
optimizer_G.load_state_dict(checkpoint['optimizer_state_dict_G'])
Check that gradients are calculated for each used parameter after the first backward pass. If some .grad
attributes are set to None
, your computation graph is detached. If that’s not the case, try to overfit a small dataset by playing around with hyperparameters.
How can I check those gradients?
You could iterate the parameters and print their .grad
attribute:
loss.backward()
for name, param in model.named_parameters():
print('{}, {}'.format(name, param.grad))