You shouldn’t be doing that, right? fmi as well as my issue here
It’s usually not wanted or necessary.
However, if your optimizer does not use any internal state, reinitializing won’t change anything.
In what situation would an optimizer have an internal state? I see here an example of the state_dicts, but it seems, in that example the “state” key of the optimizer’s state_dict is empty.
torch.optim.Adam for example has internal states, as it computes the running averages of gradient and its square.
If you don’t need to reset the optimizer (there might be use cases I’m not aware of), I would recommend to initialize it once and just use it inside the training loop.
Oh. I was using Adam! Thanks for clarifying.
there might be use cases I’m not aware of
What if the EMA averaging of the gradients and its squares are killing the training? I guess that’s where you might want to swap to something like SGD