The same classifier has a different result in the same dataset

e8fa7736c3c1cf274dc1 · May 10, 2019, 5:40am

Hi,
I test the resnet18 several times, but the accuracy and the loss value are different in every time I test.
Does anyone can answer this situation, I am so confused…
Thanks!

Here is my test function

def eval_model(device, model, loader):
    
    model.eval()
    
    avg_acc  = 0.0
    avg_loss = 0.0
    
    with torch.no_grad():
        for batch_idx, batch_data in enumerate(loader):
            images, labels = batch_data
            images = images.to(device)
            labels = labels.to(device)
            batch_size = images.shape[0]
            
            outputs = model(images)
            loss = F.cross_entropy(outputs, labels)

            # accuracy
            preds = torch.max(outputs, 1)[1]
            acc = (torch.eq(preds, labels).sum().item() / batch_size)
            
            avg_loss += loss.item()
            avg_acc += acc
        
    avg_loss /= len(loader)
    avg_acc  /= len(loader)
    print('Evaluation => Avg ACC: {:.4f}, Avg Loss: {:.4f}'.format(avg_acc, avg_loss))

Here is the result

nutszebra · May 10, 2019, 7:04am

The code looks okay.
I’m not sure, but maybe your transform function in loader is stochastic.

ptrblck · May 10, 2019, 12:54pm

Are you shuffling the DataLoader?
If the length of your dataset is not divisible by the batch size without a remainder, you might see small differences in your validation accuracy. The last batch might be smaller, which creates a bias using your normalization (dividing by len(loader)).

Try this code instead:

    acc = torch.eq(preds, labels).sum().item()
    avg_acc += acc
    avg_loss += loss.item() * images.size(0)

avg_acc /= len(loader.dataset)
avg_loss /= len(loader.dataset)

e8fa7736c3c1cf274dc1 · May 11, 2019, 1:48pm

@ptrblck @nutszebra
Hi,
Thank you very much, I solve the problem!!
Yes, I’m shuffling the DataLoader, and my dataset is not divisible by the batch size.
So, there are small differences in the validation accuracy.