How does pytorch SGD implementation performs sample selection for optimization?

Hi, I am not able to find or understand how does pytorch implementation of SGD is able to randomly select samples for optimization?

I would really appreciate if anyone is able to point me to the line or function which does the random selection or correct me.
Thanks

@ptrblck sorry for tagging you. It is just you and few are only familiar with autograd.

Hi @nile649,

The SGD optimizer doesn’t pick any samples at all, the ‘stochastic’ part of the algorthim comes from the data you use in the forward pass.

If you were to use mini-batches, that would create the stochasticity in gradient descent, likewise if you were to use all of your batch, it’d be full-batch gradient descent. It’s just a naming convention for it to be called “SGD”.

@ptrblck wil know more precise details, so correct me if I’m wrong!

1 Like

Yes, @AlphaBetaGamma96 is right and the optimizer does not create the samples but is responsible for the actual parameter updates. The naming (especially the S in SGD) might be a bit confusing.

1 Like