You shouldn’t be doing that, right? fmi as well as my issue here
It’s usually not wanted or necessary.
However, if your optimizer does not use any internal state, reinitializing won’t change anything.
In what situation would an optimizer have an internal state? I see here an example of the state_dicts, but it seems, in that example the “state” key of the optimizer’s state_dict is empty.
torch.optim.Adam
for example has internal states, as it computes the running averages of gradient and its square.
If you don’t need to reset the optimizer (there might be use cases I’m not aware of), I would recommend to initialize it once and just use it inside the training loop.
Oh. I was using Adam! Thanks for clarifying.
A not so thorough reading of, and lack of resolution in, old learning rate updation queries like this might be behind such confusion.
Maybe the optimizers where the internal state matters should be tagged at the torch optim page? Also, possibly at saving and loading models for inference. Thanks again for the prompt response.
there might be use cases I’m not aware of
What if the EMA averaging of the gradients and its squares are killing the training? I guess that’s where you might want to swap to something like SGD