How does PyTorchs SGD work?

The “stochastic” part in SGD comes from computing the gradient for mini-batches of the dataset, since Gradient Descent involves calculating the gradient for the full dataset.
This forum post might be helpful.