How small batch sizes affect performance on self-supervised DINO?

Hi, I am retraining DINO with my own custom dataset (~570k images).

  1. On my local computer, the maximum batch size is 32 (1 GPU RTX 3080 TI) and a single epoch takes around 1 hour 20 minutes to complete. Is it normal?
  2. Does a very small batch size matter to the performance?

Thank you!

Regarding your 2nd questions: In general, the larger the batch size, the less time needed to execute an epoch. So in terms of performance purely from the perspective of runtime of an epoch, yes, larger batches are better.

However, note that a fast runtime of an epoch does not necessarily mean a faster training time as the loss might go down more slowly. The large the batch, the more likely the gradients will be smaller, as the gradients will be averages across all samples in a batch. This also means that increasing the batch size typically also allows you to increase the learning rate (again, since the gradients tend to be smaller).