Averaged SGD implementation

ramsama1624 · October 10, 2018, 12:03pm

Hi, I’m looking into ASGD implementation and here, it seems like averaged weight is saved in state[‘ax’]. As far as I see it, there should be a line of code copying state[‘ax’] to p.data. However I don’t see anywhere state[‘ax’] is used for updating parameters. Am I missing something?

albanD · October 10, 2018, 1:46pm

I don’t think you’re supposed to use the average as parameters.
The average is just the final solution.
Or is it me that misread the paper?