Performing mini-batch gradient descent or stochastic gradient descent on a mini-batch

Ahh, actually sorry, it’s just a mismatch in terminology. The SGD optimizer is vanilla gradient descent (i.e. literally all it does is subtract the gradient * the learning rate from the weight, as expected). See here: How SGD works in pytorch

3 Likes