Clarification on re-initializing optimizer in every epoch

shubhvachher · June 26, 2019, 9:57am

You shouldn’t be doing that, right? fmi as well as my issue here

ptrblck · June 26, 2019, 10:22am

It’s usually not wanted or necessary.
However, if your optimizer does not use any internal state, reinitializing won’t change anything.

shubhvachher · June 26, 2019, 10:42am

In what situation would an optimizer have an internal state? I see here an example of the state_dicts, but it seems, in that example the “state” key of the optimizer’s state_dict is empty.

ptrblck · June 26, 2019, 10:45am

torch.optim.Adam for example has internal states, as it computes the running averages of gradient and its square.

If you don’t need to reset the optimizer (there might be use cases I’m not aware of), I would recommend to initialize it once and just use it inside the training loop.

shubhvachher · June 26, 2019, 11:08am

Oh. I was using Adam! Thanks for clarifying.

A not so thorough reading of, and lack of resolution in, old learning rate updation queries like this might be behind such confusion.

Maybe the optimizers where the internal state matters should be tagged at the torch optim page? Also, possibly at saving and loading models for inference. Thanks again for the prompt response.

Peter_Featherstone · April 9, 2020, 10:25am

there might be use cases I’m not aware of

What if the EMA averaging of the gradients and its squares are killing the training? I guess that’s where you might want to swap to something like SGD