hello
I try to save my model while in training so that I can resume it later, but why my saved model always have higher loss compared to non resumed training?
I’m following this thread to save my models, I save my decoder and encoder model and I also save my adam optimizer
def save_checkpoint(state):
torch.save(state, os.path.join(model_path, 'checkpoint-{}-{}.pth'.format(epoch+1, i+1)))
for epoch in range(300):
for i, (images, captions, lengths) in enumerate(data_loader):
# Set mini-batch dataset
images = images.to(device)
captions = captions.to(device)
targets = pack_padded_sequence(captions, lengths, batch_first=True)[0]
# Forward, backward and optimize
features = encoder(images)
outputs = decoder(features, captions, lengths)
loss = criterion(outputs, targets)
encoder.zero_grad()
decoder.zero_grad()
loss.backward()
optimizer.step()
if (i+1) % 50 == 0:
print('Epoch [{}/{}], Step [{}/{}], Loss:{:.4f}, Perplexity: {:5.4f}'
.format(epoch, 300, i+1, total_step, loss.item(), np.exp(loss.item())))
# Save the model checkpoints
if (epoch+1) % 100 == 0 and (i+1) % total_step == 0:
save_checkpoint({
'epoch': epoch + 1,
'encoder': encoder.state_dict(),
'decoder': decoder.state_dict(),
'optimizer' : optimizer.state_dict(),
})
you can check the whole code in here, the notebook also have training data for download https://github.com/laptopmutia/pix2codepytorch/blob/master/Pix2Code_Pytorch_SAVE_LOAD.ipynb
and here is the code how I load the model
I run this code right before training code block
#LOAD model
checkpoint = torch.load('drive/My Drive/model/checkpoint-200-350.pth')
start_epoch = checkpoint['epoch']
encoder.load_state_dict(checkpoint['encoder'])
decoder.load_state_dict(checkpoint['decoder'])
optimizer.load_state_dict(checkpoint['optimizer'])
edit 1:
hello all after further investigating I think the problem is within my optimizer state_dict
because everytime I reset my environment my saved optimizer always have different state_dict