Hi all,
I realized that when I use different batch_size
values in torch.utils.data.DataLoader
, I end up with slightly different feature vectors (which sometimes affects the model predictions) although I use both model.eval()
and torch.no_grad()
while extracting features.
To replicate this issue, I created the following test scenario. I try to extract features for 200 images where the first 100 images are exactly the same as the last 100 (in the exact same order). I simply use default pre-trained VGG16 network for this experiment. I test with batch sizes of 20, 50, and 180.
The daunting observations are in cells 12 onwards. Here are my quick notes:
- Cell 12 shows that the input tensors for images
i
andi+100
are the same. - Cell 13 shows that output tensors (obtained with different batch sizes in dataloader) for image
i
are different. - Cell 14 shows that output tensors for images
i
andi+100
are the same forbatch_size=20
andbatch_size=50
whereas the tensors differ forbatch_size=180
. - Cell 14 also shows that the output tensor for image
i
whenbatch_size=20
is equal to the output tensor for imagei+100
whenbatch_size=180
(probably because there are only 20 images left for the second batch ofdataloader180
althoughbatch_size
was set to 180).
I think the last point is the most important one because it shows that the issue is probably not exactly about the actual value of batch_size
but it is more about how many images there are actually in a batch waiting to be processed.
Unfortunately, these subtle-looking variations in feature values may yield different predictions at test time. I am not sure if there is a fix to this issue, but what should be the rule of thumb?
Would it be better to always use a batch_size=1
while testing a model or extracting features?
Thanks!