I wonder if adam.step() is a single step or multiple steps towards convergence=minimization of loss? If it’s the latter, how is convergence judged?
thank you! If so, why is adam.step() called only once for each sample point in minibatch training? Don’t we need convergence/minimization of loss making full use of each sample point?
I’m not sure I understand the “each sample point” reference, but mini-batch training is a common approach in machine learning. While some ML approaches are able to use the full training dataset for a single update, it’s usually not feasible in deep learning due to the size of the model, data etc.
If we talk about gradient descent where minibatch=fullsample, iterations=1. Then epochs determine convergence=min(loss) as the following:
for epoch in epochs
adam.step()
end
Then in minibatch training, once again epochs determine convergence=min(loss):
for epoch in epochs
for i in iterations
adam.step()
end
end
Correct?
Assuming fullsample
represents the entire dataset, then yes, the second approach is the common one and represents the mini-batch training.
Take a look at the Optimization
chapter of the Deep Learning Book and in particular 8.1.3 Batch and Minibatch Algorithms for more information.