How to manually initialize optimizer state?

Hi.

It seems like the optimizer states are created lazily on the first call of step.
I have written a multi-threaded code that share an optimizer, and on the first call to step some threads jump ahead while the state dictionary of a parameter is being created and try to use keys that don’t yet exist and cause a KeyError.
Currently I have fixed this by created the threads with bigger time delays in between, so they don’t catch up with the thread that’s creating the states.

How can I manually force the states to be created for an optimizer like rms_prop?

Thanks.

1 Like

Run step once and then fork threads?

I’ve already tried to run step once, but it seems like step doesn’t do anything if there are no gradients.
Everything seems to be based on laziness in Pytorch :smiley:

Yeah, I meant running one iteration. Sorry for not being clear. Alternatively you can also set .grad of all parameters to zeros and then step once. :slight_smile: Let me know if that works

How can I set grads to zero? simply param.grad = 0 or maybe param.grad = Variable(0)? Could you provide a short code?

param.grad = Variable(para.data.new(para.size()).zero_())

2 Likes

Thanks @SimonW
The following snippet worked for me.

for group in optimizer.param_groups:
    for p in group['params']:
        p.grad = p.data.new(p.size()).zero_()
        p.grad.requires_grad_(False)
optimizer.step()