The “stochastic” part in SGD comes from computing the gradient for mini-batches of the dataset, since Gradient Descent involves calculating the gradient for the full dataset.
This forum post might be helpful.
The “stochastic” part in SGD comes from computing the gradient for mini-batches of the dataset, since Gradient Descent involves calculating the gradient for the full dataset.
This forum post might be helpful.