I had a doubt about the impact resuming fine-tuning/training from checkpoints might have.
Is there anything that resets like the optimizer state, or perhaps some other parameter in the case of resuming training from a checkpoint instead of a model fine-tuning continuously without interruptions? I wanted to figure out the impact of resuming training vs keeping it going from scratch - is it equivalent or does it impact something in the model updates?
Resuming from a checkpoint should not break the training run and you should see the same model convergence if all states are restored. Besides restoring the state_dicts of the model and optimizer you could also check if you need to seed the code carefully to restore the data loading state as well.
Is this related to loading data for continuing the training/fine-tuning task? Does this only apply when data is shuffled? Or am I misinterpreting you here?
Yes, if you are shuffling the data and are not (re-)storing the seed before resuming the training process, the sampled indices would not be the same compared to a full training run without any interruptions. I don’t know, if you would need to restore the shuffling state as well, but wanted to mention it just in case.