The SGD optimizer doesnât pick any samples at all, the âstochasticâ part of the algorthim comes from the data you use in the forward pass.
If you were to use mini-batches, that would create the stochasticity in gradient descent, likewise if you were to use all of your batch, itâd be full-batch gradient descent. Itâs just a naming convention for it to be called âSGDâ.
@ptrblck wil know more precise details, so correct me if Iâm wrong!
Yes, @AlphaBetaGamma96 is right and the optimizer does not create the samples but is responsible for the actual parameter updates. The naming (especially the S in SGD) might be a bit confusing.