I am quite new to pytorch, my background is more mathematical.
In the tutorials I’ve been following we use gradient decent as our optimization function. Recently we began using the DataLoader class, and from what I can tell, after taking one batch of observations, and differentiating the resulting weighted sum (inside the cost function), we call .step() on the optimizer, before immediately looping back around for a new batch of observations.
I am confused because it feels like we are taking only one step with the gradient vector, per a batch of observations.
Is it the case that .step() is actually taking many gradient steps?
Or is it possible that taking one step with a given batch then immediately receiving a new cost-function with different terms (different observations) is a good practice?