I wonder if adam.step() is a single step or multiple steps towards convergence=minimization of loss? If it’s the latter, how is convergence judged?

thank you! If so, why is adam.step() called only once for **each** sample point in minibatch training? Don’t we need convergence/minimization of loss making full use of **each** sample point?

I’m not sure I understand the “each sample point” reference, but mini-batch training is a common approach in machine learning. While some ML approaches are able to use the full training dataset for a single update, it’s usually not feasible in deep learning due to the size of the model, data etc.

If we talk about gradient descent where minibatch=fullsample, iterations=1. Then **epochs** determine convergence=min(loss) as the following:

for epoch in epochs

adam.step()

end

Then in minibatch training, once again **epochs** determine convergence=min(loss):

for epoch in epochs

for i in iterations

adam.step()

end

end

Correct?

Assuming `fullsample`

represents the entire dataset, then yes, the second approach is the common one and represents the mini-batch training.

Take a look at the `Optimization`

chapter of the Deep Learning Book and in particular 8.1.3 Batch and Minibatch Algorithms for more information.