Yeah - newcomer to PyTorch here and I find the SGD name really confusing too. I understand SGD as gradient descent with a batch size of 1, but in reality the batch size is determined by the user. So I agree that it would be much less confusing if it was named just GD because that’s what it is.