Number of samples per batch

How many samples should I load per batch during training??

Is there any optimal number?

The number of samples per batch (aka “batch_size”) can have several effects on your training process:

  1. Large batches helps you “cover” your training set faster, that is, you’ll need fewer iterations per epoch.
  2. Gradients computed using large batches tends to be “smoother” - thus allowing for more stable optimization.
    On the other hand, large batches consume more GPU memory.
    Therefore, people usually selects batch size as big as GPU memory allows.

Further reading:
Relation between batch size and learning rate: Don’t Decay the Learning Rate, Increase the Batch Size.
Visualizing the relation between batch size and learning rate can be found here.

For vaidation, you do not care about “smoothness” of gradients only about run time than you should tune your validation set’s batch_size to be as large as you can to fit into GPU memory.