SGD and batch_size in data.DataLoader()

shirui-japina · August 15, 2020, 7:29am

When we use SGD algorithm to update the parameters in the model, we use only one sample to feed the model each time, it it right?
But in PyTorch, even when we use SGD, we have to set batch_size and its value can not be 1.
What puzzles me is, when use SGD and batch_size is not 1, what exactly is the algorithm of the program? Is it still SGD?

mariosasko · August 15, 2020, 1:41pm

This statement is not true, e.g. 1 is a default batch_size of torch.utils.data.DataLoader. So to get “true” SGD, you are free to use 1 as batch_size.

When batch_size is greater than 1, the algorithm is called “mini-batch” gradient descent, and when batch_size is equal to len(dataset) we are talking about “batch” gradient descent.

shirui-japina · August 16, 2020, 5:15am

I see.

Thanks for the detailed explanation.