Gradient flow using Adam optimizer

I used adam optimizer for training. I trained my model as below:
(1) From scratch for 50 epochs
(2) From scratch for 55 epochs
(3) Load the save model of 50 epochs and run for 5 epochs ( 50+5=55)

For epoch-51, case(2) and case(3) the results are same but for 52 epoch results is different as initialization of first and second moment running average coefficient is different. For case(2), it is in continue while for case(3) it is zero.

Is it issue or model will be stable as training goes on?


When (1) is done, you should save the state_dict of the model AND optimizer. So when you start (3) you load the pre-trained model and the optimizer to restore its state, as adviced here: Saving & Loading a General Checkpoint for Inference and/or Resuming Training.

Best regards,