Hi, I’m looking into ASGD implementation and here, it seems like averaged weight is saved in state[‘ax’]. As far as I see it, there should be a line of code copying state[‘ax’] to p.data. However I don’t see anywhere state[‘ax’] is used for updating parameters. Am I missing something?
I don’t think you’re supposed to use the average as parameters.
The average is just the final solution.
Or is it me that misread the paper?