Gradient or Activation information from Saved Optimizer's State Dict

Kale-ab_Tessera · May 25, 2020, 2:15pm

I have saved the weights and optimizers state dict following from - https://pytorch.org/tutorials/beginner/saving_loading_models.html.

Can I get gradient or activation information from the optimizer’s state dict? What other information can I get about the network behaviour from the optimizers state dict?

I do have the weights from the model’s state dict, but I am trying to get other information such as activations or gradients.

Kale-ab_Tessera · May 25, 2020, 3:03pm

I understand that the optimizer’s state dict has a state variable, which contains optimization state and this is different for each optimizer.

I am trying to do this accross the optims - Adam, Adagrad, SGD, SGD with momentum and Rmsprop. Do I simply get the state info and use that to compare optimizers e.g. state['exp_avg'] (from Adam) and compare it to state['sum'] (from Adagrad)?

Kushaj · May 25, 2020, 5:35pm

This will not work. Both these states carry different meanings specific to the optimizer. Also, these values are largely dependent on the training process.

No. This information is stored in the weights of the model, as weight.grad.