When selecting random batches – in particular for stochastic
gradient descent – how does one usually handle selecting the
same sample more than once?
For example, let’s say my training set contains 1000 samples,
and my batch size is 20. (So one epoch, if I understand the
terminology, will use 50 batches, each batch being used for
one step of the SGD algorithm.) (Assume that I want my
training set to be uniformly weighted.)
I could build a batch by randomly drawing, with replacement,
20 samples from my training set. If I do this there is a low
probability that any single batch will contain any duplicate
samples, but it is very unlikely that my epoch (that is, my set
of 50 batches, in aggregate) will contain all 1000 samples.
Instead, I could draw batches without replacement. Now
no single batch will contain a duplicate, but my epoch still,
most likely, will skip several samples.
Or I could randomly permute my training set, and then split
it up into 50 batches of 20 samples each. My epoch now
contains all samples (exactly once), and the batches, neither
singly nor in aggregate, contain duplicates.
Is there some conventional standard practice for selecting
random batches? Is there a clear best practice? Would it
matter very much which approach one takes?
A couple of side questions:
Does anyone know off hand how keras / tensorfow handles
optimizer = 'sgd' (where the optimizer selects
What do people normally do if the size of the training set is not
a multiple of the batch size? (This question really only arises if
you want each epoch to use each of the training samples.)
Thanks for any thoughts.