I wouldn’t expect to see a difference in performance while using a new GradScaler.
In case you are recreating a new GradScaler, the first iteration(s) might have been skipped due to a high scaling factor, so your runs might be a bit different. You could thus check, if resuming the training in FP32 (i.e. without the GradScaler) and if resuming with the GradScaler` + skipping the same number of update steps would also yield a different performance.