Optimizer python loops

I’ll admit I’m a bit confused about how the optimization code works under the hood. The code for SGD uses python for loops to iterate through each parameter:

            for p in group['params']:
                if p.grad is not None:

                    state = self.state[p]
                    if 'momentum_buffer' not in state:

Wouldn’t this be incredibly slow? Is there some sort of just in time compilation going on to speed things up? If I were to implement my own optimizer would I have access to the same performance?

The slow for loop approach allows for “hackable” optimizers, so that researchers can easily change the optimizer and run new experiments.
@crcrpar is working on speeding up the step by using the internal for_each methods (previously used in apex as multi_tensor_apply).

1 Like