[Solved] Different accuracy with different batch size while testing

I am trying to evaluate my model and find that I get different results while using bs=1 and bs=2(the length of test set is odd so there shouldn’t be any problem about truncature).
I looked up other topics and still failed to solve my problem.

The dataset is MNIST in csv format, I also uploaded it to OneDrive(15MB)

Here is my code:

import pandas as pd
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torch.nn.functional as F
from sklearn.utils import shuffle
import cv2
import numpy as np
import random
import matplotlib.pyplot as plt

device=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

#===================Building Model===================
class Flatten(nn.Module):
    def __init__(self):
        super(Flatten, self).__init__()
        
    def forward(self, x):
        return x.view(x.size(0), -1)
    
class Softmax(nn.Module):
    def __init__(self):
        super(Softmax, self).__init__()
        
    def forward(self, x):
        return F.log_softmax(x)

class CNN(nn.Module):
    def __init__(self, sz=28, nf=64):
        super(CNN, self).__init__()
        self.model=nn.Sequential(*[
            nn.Conv2d(1, nf, 4, 2, 1), nn.BatchNorm2d(nf), nn.LeakyReLU(0.2, True),
            nn.Conv2d(nf, nf*2, 4, 2, 1), nn.BatchNorm2d(nf*2), nn.LeakyReLU(0.2, True),
            nn.Conv2d(nf*2, nf*4, 3, 1, 0), nn.BatchNorm2d(nf*4), nn.LeakyReLU(0.2, True),
            Flatten(), nn.Linear(5*5*nf*4, 10), Softmax(),
        ])
        
    def forward(self, x):
        return self.model(x)

model=CNN().to(device)

===================Loading data===================
train_data=pd.read_csv("./data/train.csv")
test_data=pd.read_csv("./data/test.csv")

X_train=train_data.values[:, 1:]
Y_train=train_data.values[:, 0:1]
X_test_all, X_train_all = np.split(X_train, [5000], axis=0)
Y_test_all, Y_train_all = np.split(Y_train, [5000], axis=0)

===================Measuring===================
def measure(model, X_test, Y_test, bs=1):
    model.eval()
    batch_size=bs
    accuracy=0.0
    
    for st in range(0, len(Y_test)//batch_size):
        X_batch, Y_batch=X_test[st:st+batch_size], Y_test[st:st+batch_size]
        model.zero_grad()
        model.eval()
        X_batch=X_batch.reshape([X_batch.shape[0],1]+[28,28])
        X_batch=torch.tensor(X_batch).type(torch.FloatTensor)/255.0
        Y_batch=torch.tensor(Y_batch).type(torch.LongTensor).squeeze(1)
        X_batch, Y_batch=X_batch.to(device), Y_batch.to(device)
        predict=model(X_batch)
        predict=predict.max(1)[1]
        accuracy+=predict.eq(Y_batch).double().sum().item()
    
    return accuracy/len(Y_test)

print(len(Y_test_all))
print(measure(model, X_test_all, Y_test_all, 2))
print("===================================")
print(measure(model, X_test_all, Y_test_all))

And it turns out the accuracy is 0.0636 when batch size is set to 2, and 0.0674 when batch size is set to 1.

Hi Chris!

Regardless of the value of batch_size (bs), st in your for
loop moves along in increments of one.

Therefore when batch_size == 1, you iterate over all elements
of Y_test and Y_test (which is what you want). But when
batch_size == 2 you only iterate over approximately half of
the elements, with most of the elements that you iterate over being
included in two batches.

That is, when batch_size == 2 you iterate over the correct number
of batches, but the batches overlap, so you get (approximately) two
copies of the first half of X_batch and Y_batch, and nothing from
the second half.

So you’re not calculating the accuracy on the same data, and the
results differ, as would be expected.

Good luck.

K. Frank

You are right, I used different data for the index of samples are incorrect. After I change[st:st+batch_size]to[st*bs:st*bs+batch_size], everything goes well.

You perfectly solved my problem. Thanks a lot. :smile: