KeyError Radam_buffer

diegodalvarez · January 31, 2023, 12:49am

Trying to copy this code down here. I made a dedicate anaconda environment for all of the packages. It’s this piece of code that is giving me problems.

# find optimal learning rate
res = trainer.tuner.lr_find(
    net,
    train_dataloaders=train_dataloader,
    val_dataloaders=val_dataloader,
    min_lr=1e-5,
    max_lr=1e01,
    early_stop_threshold=100)

It keeps giving me this error

File ~\anaconda3\envs\pytorch_forecasting\lib\site-packages\pytorch_lightning\trainer\connectors\checkpoint_connector.py:234, in CheckpointConnector.restore(self, checkpoint_path)
    231 self.restore_callbacks()
    233 # restore training state
--> 234 self.restore_training_state()
    235 self.resume_end()

File ~\anaconda3\envs\pytorch_forecasting\lib\site-packages\pytorch_lightning\trainer\connectors\checkpoint_connector.py:286, in CheckpointConnector.restore_training_state(self)
    283 assert self.trainer.state.fn is not None
    284 if self.trainer.state.fn == TrainerFn.FITTING:
    285     # restore optimizers and schedulers state
--> 286     self.restore_optimizers_and_schedulers()

File ~\anaconda3\envs\pytorch_forecasting\lib\site-packages\pytorch_lightning\trainer\connectors\checkpoint_connector.py:382, in CheckpointConnector.restore_optimizers_and_schedulers(self)
    377     if "optimizer_states" not in self._loaded_checkpoint:
    378         raise KeyError(
    379             "Trying to restore optimizer state but checkpoint contains only the model."
    380             " This is probably due to `ModelCheckpoint.save_weights_only` being set to `True`."
    381         )
--> 382     self.restore_optimizers()
    384 if "lr_schedulers" not in self._loaded_checkpoint:
    385     raise KeyError(
    386         "Trying to restore learning rate scheduler state but checkpoint contains only the model."
    387         " This is probably due to `ModelCheckpoint.save_weights_only` being set to `True`."
    388     )

File ~\anaconda3\envs\pytorch_forecasting\lib\site-packages\pytorch_lightning\trainer\connectors\checkpoint_connector.py:397, in CheckpointConnector.restore_optimizers(self)
    394     return
    396 # restore the optimizers
--> 397 self.trainer.strategy.load_optimizer_state_dict(self._loaded_checkpoint)

File ~\anaconda3\envs\pytorch_forecasting\lib\site-packages\pytorch_lightning\strategies\strategy.py:368, in Strategy.load_optimizer_state_dict(self, checkpoint)
    366 optimizer_states = checkpoint["optimizer_states"]
    367 for optimizer, opt_state in zip(self.optimizers, optimizer_states):
--> 368     optimizer.load_state_dict(opt_state)
    369     _optimizer_to_device(optimizer, self.root_device)

File ~\anaconda3\envs\pytorch_forecasting\lib\site-packages\torch\optim\optimizer.py:244, in Optimizer.load_state_dict(self, state_dict)
    241     return new_group
    242 param_groups = [
    243     update_group(g, ng) for g, ng in zip(groups, saved_groups)]
--> 244 self.__setstate__({'state': state, 'param_groups': param_groups})

File ~\anaconda3\envs\pytorch_forecasting\lib\site-packages\pytorch_forecasting\optim.py:133, in Ranger.__setstate__(self, state)
    131 def __setstate__(self, state: dict) -> None:
    132     super().__setstate__(state)
--> 133     self.radam_buffer = state["radam_buffer"]
    134     self.alpha = state["alpha"]
    135     self.k = state["k"]

KeyError: 'radam_buffer'

ptrblck · January 31, 2023, 1:46am

Based on the error message it seems the pytorch_forcasting optimizer expects to load a radam_buffer from the state_dict while the state_dict you are loading does not contain this key.
I don’t know what find_lr exactly does but do you know if it’s trying to store and load state_dicts internally, which might cause the error?

Tadashi_Mori · February 7, 2023, 3:54pm

I’m having the same problem but with this tutorial here:
https://pytorch-forecasting.readthedocs.io/en/stable/tutorials/stallion.html

I installed a brand new environment, installed the packages with conda

conda install pytorch-forecasting pytorch>=1.7 -c pytorch -c conda-forge

and I get the exact same error when running:

res = trainer.tuner.lr_find(
    tft,
    train_dataloaders=train_dataloader,
    val_dataloaders=val_dataloader,
    max_lr=10.0,
    min_lr=1e-6,
)

Edit: Finally solved this problem. Just add optimizer=‘adam’ in your TFT and the problem goes away.

Faiga_Alawad · February 27, 2023, 6:06pm

This fixed it for me