Optimizer python loops

I’ll admit I’m a bit confused about how the optimization code works under the hood. The code for SGD uses python for loops to iterate through each parameter:

            for p in group['params']:
                if p.grad is not None:
                    params_with_grad.append(p)
                    d_p_list.append(p.grad)

                    state = self.state[p]
                    if 'momentum_buffer' not in state:
                        momentum_buffer_list.append(None)
                    else:
                        momentum_buffer_list.append(state['momentum_buffer'])

Wouldn’t this be incredibly slow? Is there some sort of just in time compilation going on to speed things up? If I were to implement my own optimizer would I have access to the same performance?

The slow for loop approach allows for “hackable” optimizers, so that researchers can easily change the optimizer and run new experiments.
@crcrpar is working on speeding up the step by using the internal for_each methods (previously used in apex as multi_tensor_apply).

1 Like