What's the difference between different ways to use the traing data?

During the process of training the model, there are different ways to use the dataset:

  1. total dataset is splitted into many tiny parts named mini-batch:
    num_batch=total_sample / batch_size
    for epoch: # here one epoch means the entire dataset
    for num_batch:
    batch = dataset[…num_betch…]
    2.sample randomly as a batch:
    for epoch: # here one epoch means a batch sampled randomly
    batch = random_sample(batch_size)

What the difference between the aforementioned ways?


Let the training be supervised (supervised learning).
In this situation, each sample in total dataset has feature x and label y.
Usually, publicly available datasets such as MNIST and CIFAR 10/100 group all samples by labels.

So, if you sample mini batches in the way described below, some mini batches have only 1 class samples and are so biased.

On the other hand, sample mini batches at random would make your mini batches diverse.
In general, the latter makes trained model more geenralized.