Well, so restoring the state and continuing is the equivalent of doing a single larger training run.
What happens is that during the first few steps, the statistics gathered by the optimizer are still “rubbish”, and so you will take steps of more or less not terribly controlled size (the could be more precise, I guess). This has - at the beginning of the training - bothered some people enough to run a few steps with a learning rate of 0 in the beginning or re-initialize after a few steps (i.e. just update statistics), even if I cannot find the reference at the moment.
So in a way, your method of re-starting Adam is “take a step in a lucky direction”. You might try to get a similar effect more systematically by varying the learning rate upwards or somesuch.