ASGD Optimizer Has A Bug?

Hi, everyone,

I am referring the ASGD optimizer source code, link
https://pytorch.org/docs/stable/_modules/torch/optim/asgd.html#ASGD

and confused with the following segment

            if state['mu'] != 1:
                state['ax'].add_(p.sub(state['ax']).mul(state['mu']))
            else:
                state['ax'].copy_(p)

It seems that the state[“ax”] is not used in the whole code. It just computes this variable and this variable is not used for update parameters.

Is it a bug?

It seems ax was never used in this and all previous implementations (it was returned at one points, but I don’t know if and how it was further used).
CC @vincentqb do you know, if and where this state is used?

Thanks @ptrblck

From the ASGD implementation, it has the same effect as SGD . And the state[“ax”] is not used for updating parameters. Suggesting a modification in the next version.

I don’t quite understand the explanation. Where should ax be used then? Do you mean it doesn’t need to be calculated and is just there for “debugging” purposes?

I am not sure enough. I cannot download the corresponding paper and have no idea about how ASGD updates the parameters. Generally, the state[‘ax’] should be used for modifying the p.grad (gradient) as Adagrad or Adam does.