When I use mini batch gradient descent, what optimizer should I use?

111492 · March 29, 2021, 3:25pm

When I use mini batch gradient descent, what optimizer should I use?
I see that some people use optim.SGD(), but Stochastic gradient descent is not mini batch gradient descent.There is some direct difference between them. Why can I use optim.SGD() when I use mini batch gradient descent?

i saw Yun Chen say that “SGD optimizer in PyTorch actually is Mini-batch Gradient Descent with momentum” Can someone please tell me the rationale for this?

Thank you for reading my query.
I look forward to hearing from you all.

My English is not very good, so I took the help of DEEPL translation. There may be some grammatical errors or improper use of words！Please forgive me