It seems like the optimizer states are created lazily on the first call of step.
I have written a multi-threaded code that share an optimizer, and on the first call to step some threads jump ahead while the state dictionary of a parameter is being created and try to use keys that don’t yet exist and cause a KeyError.
Currently I have fixed this by created the threads with bigger time delays in between, so they don’t catch up with the thread that’s creating the states.
How can I manually force the states to be created for an optimizer like rms_prop?
I’ve already tried to run step once, but it seems like step doesn’t do anything if there are no gradients.
Everything seems to be based on laziness in Pytorch
Yeah, I meant running one iteration. Sorry for not being clear. Alternatively you can also set .grad of all parameters to zeros and then step once. Let me know if that works