Saved model have higher loss

laptopmutia · September 18, 2018, 5:03pm

hello
I try to save my model while in training so that I can resume it later, but why my saved model always have higher loss compared to non resumed training?
I’m following this thread to save my models, I save my decoder and encoder model and I also save my adam optimizer

def save_checkpoint(state):
    torch.save(state, os.path.join(model_path, 'checkpoint-{}-{}.pth'.format(epoch+1, i+1)))

for epoch in range(300):
    for i, (images, captions, lengths) in enumerate(data_loader):
        
        # Set mini-batch dataset
        images = images.to(device)
        captions = captions.to(device)
        targets = pack_padded_sequence(captions, lengths, batch_first=True)[0]
        
        # Forward, backward and optimize
        features = encoder(images)
        outputs = decoder(features, captions, lengths)        
        loss = criterion(outputs, targets)
        encoder.zero_grad()
        decoder.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 50 == 0:
            print('Epoch [{}/{}], Step [{}/{}], Loss:{:.4f}, Perplexity: {:5.4f}'
                  .format(epoch, 300, i+1, total_step, loss.item(), np.exp(loss.item())))
                  
        # Save the model checkpoints
        if (epoch+1) % 100 == 0 and (i+1) % total_step == 0:
            save_checkpoint({
              'epoch': epoch + 1,
              'encoder': encoder.state_dict(),
              'decoder': decoder.state_dict(),
              'optimizer' : optimizer.state_dict(),
            })

you can check the whole code in here, the notebook also have training data for download https://github.com/laptopmutia/pix2codepytorch/blob/master/Pix2Code_Pytorch_SAVE_LOAD.ipynb

and here is the code how I load the model

I run this code right before training code block

#LOAD model
checkpoint = torch.load('drive/My Drive/model/checkpoint-200-350.pth')
start_epoch = checkpoint['epoch']
encoder.load_state_dict(checkpoint['encoder'])
decoder.load_state_dict(checkpoint['decoder'])
optimizer.load_state_dict(checkpoint['optimizer'])

edit 1:
hello all after further investigating I think the problem is within my optimizer state_dict

because everytime I reset my environment my saved optimizer always have different state_dict

Qian_Wang · November 24, 2018, 11:03am

Hi, I have the same problem when saving adam optimizer. As you mentioned, we got different state_dict, but how can prevent it?
How did you fix your code, please?

yingda.yin · February 25, 2020, 12:06pm

Hi, what do you mean by “saved optimizer always have different state_dict”
since you saved state_dict of optimizer and reload it, too.