Inconsistent Results for reloading the model

Hey Guys,

I’m facing an issue with trained models, during the training in my own dataset I get a CIDEr Score around 0.03 in the validation set, but when I save and then reload the best model and try to evaluate it the score is about zero… But it is even stranger because it works perfectly with MS COCO dataset.

Here is my training script:
train.py

I save the model as follow:

    def save_checkpoint(model, infos, optimizer, histories=None, append=''):
        if len(append) > 0:
            append = '-' + append
        # if checkpoint_path doesn't exist
        if not os.path.isdir(opt.checkpoint_path):
            os.makedirs(opt.checkpoint_path)

        checkpoint_path = os.path.join(opt.checkpoint_path, 'model%s.pth' %(append))
        torch.save(model.state_dict(), checkpoint_path)
        print("model saved to {}".format(checkpoint_path))

        optimizer_path = os.path.join(opt.checkpoint_path, 'optimizer%s.pth' %(append))
        torch.save(optimizer.state_dict(), optimizer_path)
        with open(os.path.join(opt.checkpoint_path, 'infos_'+opt.id+'%s.pkl' %(append)), 'wb') as f:
            utils.pickle_dump(infos, f)

        if histories:
            with open(os.path.join(opt.checkpoint_path, 'histories_'+opt.id+'%s.pkl' %(append)), 'wb') as f:
                utils.pickle_dump(histories, f)

And I reload it:
evaluation script

model.load_state_dict(torch.load(opt.model, map_location='cpu'))
model.cuda()
model.eval()

Do you have any idea about what may be happening?

Sorry if it sounds like a stupid question, but do you set it to eval mode during validation (when you get a score of around 0.03)?

Yes, I set model.eval(), compute the score, and then I set model.train() again to keep on the training