Batch_size and validation accuracy

I follow the tutorial to train a cnn model on CIFAR10, and when I use this model to validate on test_data, I got different accuracy when I use different batch_size on test_data, is it normal? As you can see below, as the batch_size increased to 280, the accuracy of this model has declined. I use batch size of 64 to train this model, I don’t know why just changing the batch size of test data will get different accuracy.

# Validate on test_data
def test_accu(batch_size):
    test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
    correct_count = 0.0
    for i, data in enumerate(test_loader, 0):
        img, labels = data
        x = Variable(img)
        y = Variable(labels)
        if torch.cuda.is_available():
            x = x.cuda()
            y = y.cuda()
        outs = model(x)
        _, pred = torch.max(outs, -1)
        correct_count += (pred == y).sum().data[0]
    return correct_count
for i in range(200, 500, 10):
    correct_count = test_accu(i)
    print('BatchSize: {}, Accu: {}'.format(i, correct_count/len(test_data)))

BatchSize: 200, Accu: 0.90894
BatchSize: 210, Accu: 0.90894
BatchSize: 220, Accu: 0.90894
BatchSize: 230, Accu: 0.90894
BatchSize: 240, Accu: 0.90894
BatchSize: 250, Accu: 0.90894
BatchSize: 260, Accu: 0.90894
BatchSize: 270, Accu: 0.90382
BatchSize: 280, Accu: 0.4891
BatchSize: 290, Accu: 0.08974
BatchSize: 300, Accu: 0.06414
BatchSize: 310, Accu: 0.08462
BatchSize: 320, Accu: 0.11022
BatchSize: 330, Accu: 0.13582
BatchSize: 340, Accu: 0.1563
BatchSize: 350, Accu: 0.17678
BatchSize: 360, Accu: 0.19726
BatchSize: 370, Accu: 0.21774
BatchSize: 380, Accu: 0.23822
BatchSize: 390, Accu: 0.25358
BatchSize: 400, Accu: 0.26894
BatchSize: 410, Accu: 0.2843
BatchSize: 420, Accu: 0.29966
BatchSize: 430, Accu: 0.31502
BatchSize: 440, Accu: 0.32526
BatchSize: 450, Accu: 0.34062
BatchSize: 460, Accu: 0.35086
BatchSize: 470, Accu: 0.36622
BatchSize: 480, Accu: 0.37646
BatchSize: 490, Accu: 0.3867

once the model is in .eval() mode, the BatchSize should not matter. This is weird.
If you give a script that I can run and reproduce this, I’m happy to look into it. I think it’s just a user-error somewhere.

Hi smth,
I use this script to train the model,
EPOCHES I set to 20, and I use pytorch 0.1.12 with cuda 8.0 .
Then I use the code in question description to test the model, seems batch_size really impact the results.

This was a subtle issue.

Change this line:

correct_count += (pred == y).sum().data[0]


correct_count += (pred == y).double().sum().data[0]

The problem is that pred == y returns a ByteTensor, which has only an 8-bit range. Hence, after a particular batch-size, the sum was overflowing, and hence the wrong results.


Thanks smth, that works!

wow, that is a subtle bug, anyway to prevent these kinds of issues from happening better? I imagine this kind of issue will pop up often.

Hi! I am also having a similar issue with my code. Testing with different batch sizes is giving me different results. I am doing it like this…

(inside my test loader code, after prediction…)

pred = pred.view(-1)
label = label.view(-1)
correct_count += (pred == label).double().sum().item()
total_count += pred.size(0)

and at the end of the loop over test loader I do this…

mean_accuracy = correct_count * 100 / total_count

I have tried so many different test sizes and found out that test accuracy is max, 96% with a test batch size of 512 and above and keeps declining with decrease in batch size until batch size 1 gives me 11% accuracy and I have a total of 10 classes, which means I could achieve this with random weights as well. I can’t figure out what I am doing wrong?