The same classifier has a different result in the same dataset

I test the resnet18 several times, but the accuracy and the loss value are different in every time I test.
Does anyone can answer this situation, I am so confused…

Here is my test function

def eval_model(device, model, loader):
    avg_acc  = 0.0
    avg_loss = 0.0
    with torch.no_grad():
        for batch_idx, batch_data in enumerate(loader):
            images, labels = batch_data
            images =
            labels =
            batch_size = images.shape[0]
            outputs = model(images)
            loss = F.cross_entropy(outputs, labels)

            # accuracy
            preds = torch.max(outputs, 1)[1]
            acc = (torch.eq(preds, labels).sum().item() / batch_size)
            avg_loss += loss.item()
            avg_acc += acc
    avg_loss /= len(loader)
    avg_acc  /= len(loader)
    print('Evaluation => Avg ACC: {:.4f}, Avg Loss: {:.4f}'.format(avg_acc, avg_loss))

Here is the result

The code looks okay.
I’m not sure, but maybe your transform function in loader is stochastic.

Are you shuffling the DataLoader?
If the length of your dataset is not divisible by the batch size without a remainder, you might see small differences in your validation accuracy. The last batch might be smaller, which creates a bias using your normalization (dividing by len(loader)).

Try this code instead:

    acc = torch.eq(preds, labels).sum().item()
    avg_acc += acc
    avg_loss += loss.item() * images.size(0)

avg_acc /= len(loader.dataset)
avg_loss /= len(loader.dataset)
1 Like

@ptrblck @nutszebra
Thank you very much, I solve the problem!!
Yes, I’m shuffling the DataLoader, and my dataset is not divisible by the batch size.
So, there are small differences in the validation accuracy.