Hi.
I save the checkpoints using state_dict like below;
def save_checkpoint(states, is_best, output_dir,
filename='checkpoint.pth.tar'):
torch.save(states, os.path.join(output_dir, filename))
if is_best and 'state_dict' in states:
torch.save(states['state_dict'],
os.path.join(output_dir, 'model_best.pth.tar'))
save_checkpoint({
'epoch': epoch + 1,
'model': get_model_name(config),
'state_dict': model.state_dict(),
'perf': perf_indicator,
'optimizer': optimizer.state_dict(),
}, best_model, final_output_dir)
This is an example of my situation.
e.g.)
If i stop training at 150 epoch and resume from 151 epoch;
- 150 epochs accuracy is 90%(final accuracy before resume training)
but, 151 epochs accuracy is 80%…
There is a big difference in the accuracy before and after resuming training.
Why is this happening and what should I check?