Hello,I have restored optimizer like this:
self.optimizer.load_state_dict(checkpoint[OPTIMIZER_STATE_DICT])
self.optimizer.param_groups = checkpoint[OPTIMIZER_PARAM_GROUP]
and keep got error like this:
[rank0]: self.grad_scaler.step(self.optimizer)
[rank0]: File "/data/py_envs/lib/python3.12/site-packages/torch/amp/grad_scaler.py", line 451, in step
[rank0]: len(optimizer_state["found_inf_per_device"]) > 0
[rank0]: AssertionError: No inf checks were recorded for this optimizer.
And tried all solutions by googling,but still,can’t fix this problem.
please help to fix this