Variable batch size during training

Hi. I am training an RNN encoder-decoder to make predictions of the trajectory of agents. I am using the KITTI dataset. The training dataset consists of 21 image folders and each folder has a variable number of images. So, this means that the batch size changes during training, for example:

A batch size of one, when training an encoder, means one sequence. Keeping that in mind, let’s say I take as input 5 images and, while decoding, also works on 5 images. So, that is a total of 10 images. So, a single sequence consists of 10 images, hence, a batch size of one is the same as 10 images in this case. Now, I have a folder of 30 images, so I can work on 3 sequences, hence my batch size is 3 for this folder. However, the next folder has 50 images, so that means the batch size becomes 5.

So, I guess my question is whether this is okay or not? Is there any harm while training like this?

Well, from the backprop perspective and theoretical perspective there is no problem with increasing the batch size. However there are some practical issues:

  • Normalization depends on batch statistics. The bigger the batch size is, the more general statistics are supposed to be. You should wonder if normalization will properly generalize.
  • Additionally, there exist a relationship (empirically shown) between batch size and learning rate Explanation in a log-scale. If your batch size increases a lot your training may require LR adjustments. In the same way you can find this paper Don’t Decay the Learning Rate, Increase the Batch Size which also gets into this problem.

Yes but the batch size is variable, so while training it may increase (I start using images from a folder that has 100 images) or decrease (I once again come across a folder with 30 images). Is such fluctuation okay? Or is it only a problem when the batch size keeps on increasing while training?

Well, I guess what you expect is a binary answer, yes or no. I unfortunately cannot provide that. What I’m trying to say is something like, variable batch size may be stable up to some point. As it is dataset-dependent and architecture-dependent I cannot infer what is going to happen in your case. If such fluctuation is not very big it should be ok. But it is for sure a factor to keep an eye on if you training doesn’t work as expected.

1 Like