I’m using some pre-trained CNNs (ResNets and VGGs) - I’m trying to aggregate the classification using averaging of the softmax output vectors - only on the test set of CIFAR10.
Once I iterate over the testing set with batch size = 128/256, the accuracy is around 92%~
Once I iterate over the testing set with batch size = 1, the accuracy is around 12%~!
Again, I’m using pre-trained CNNs, input is the test set of CIFAR10.
It’s hard to infer what the issue is with just a description of the batch sizes; could you post a code snippet showing your data loading pipeline for the test set?
Be aware that the bigger the batch size the better (as long as you have enough computing power). This is because with a bigger batch size, the more samples are considered to calculate the gradient.
If you choose a batch size of 1, you would be optimizing your network in each step for the image it sees. Instead, if you show it a bigger amount of images in one step, the gradient will be calculated so it decreases the error for all of the shown images.
Theoretically, you should use the whole dataset as a batch so you would do the gradient update on the whole dataset. But this is not always possible due to constraints in the resources you may have available. This is why we use batch training (sometimes refered as mini-batch).
Finally, have in mind that some types of normalization (e.g. Batch Normalization) need at least two samples to learn.
This drop in accuracy is because your models use Batch Normalization. Batch Normalization has been shown to work badly when the batch size is small. From this SO answer:
unless you can explicitly justify it, I advise against using BatchNormalization with batch_size=1; there are strong theoretical reasons against it, and multiple publications have shown BN performance degrade for batch_size under 32, and severely for <=8. In a nutshell, batch statistics “averaged” over a single sample vary greatly sample-to-sample (high variance), and BN mechanisms don’t work as intended.