When I use mini batch gradient descent, what optimizer should I use?
I see that some people use optim.SGD(), but Stochastic gradient descent is not mini batch gradient descent.There is some direct difference between them. Why can I use optim.SGD() when I use mini batch gradient descent?
i saw Yun Chen say that “SGD optimizer in PyTorch actually is Mini-batch Gradient Descent with momentum” Can someone please tell me the rationale for this?
Thank you for reading my query.
I look forward to hearing from you all.
My English is not very good, so I took the help of DEEPL translation. There may be some grammatical errors or improper use of words！Please forgive me