Adam.step() is a single step or multiple

robinho · July 13, 2022, 2:45am

I wonder if adam.step() is a single step or multiple steps towards convergence=minimization of loss? If it’s the latter, how is convergence judged?

ptrblck · July 13, 2022, 2:47am

It’s a single weight update step as seen here and here.

robinho · July 13, 2022, 3:09am

thank you! If so, why is adam.step() called only once for each sample point in minibatch training? Don’t we need convergence/minimization of loss making full use of each sample point?

ptrblck · July 13, 2022, 3:30am

I’m not sure I understand the “each sample point” reference, but mini-batch training is a common approach in machine learning. While some ML approaches are able to use the full training dataset for a single update, it’s usually not feasible in deep learning due to the size of the model, data etc.

robinho · July 13, 2022, 3:45am

If we talk about gradient descent where minibatch=fullsample, iterations=1. Then epochs determine convergence=min(loss) as the following:
for epoch in epochs
adam.step()
end

Then in minibatch training, once again epochs determine convergence=min(loss):
for epoch in epochs
for i in iterations
adam.step()
end
end

Correct?

ptrblck · July 13, 2022, 4:27am

Assuming fullsample represents the entire dataset, then yes, the second approach is the common one and represents the mini-batch training.
Take a look at the Optimization chapter of the Deep Learning Book and in particular 8.1.3 Batch and Minibatch Algorithms for more information.