Shuffling dataloader produces wildly different results - Test accuracy issue

When checking the test set accuracy score i’ve noticed a fluctuating accuracy on each run. Currently i’m using a batch size of 8.

How does batch size and shuffling effect the output of the model? It appears it makes all the difference, if I give the model all one class in a batch it doesn’t perform well.

I have a few questions.

  1. When the data in the dataloader is shuffled should I experiance a different score each time Its run?

I average 91-92%, but the accuracy is different every time.

  1. If I set the dataloaders shuffle to FALSE I get an accuracy score of 50%, a huge reduction from when I shuffle the data. Is this normal?

  2. If I manually iterate through each image, convert to tensor, normalize and unsqueeze, I also only get 50% accuracy as when I dont shuffle the dataloader? Is this normal.

I’m sure there is something wrong but can’t for the life of me work it out. Any help appreciated.


I’ve also tried changing the batch size and have realised that this also has an effect on the accuracy.

But I am not backpropogating and have gradients switched off. I thought the way it works is that it predicts on each image in the tensor individually? But I guess this isnt correct?

So how do I infer on a single image if I wanted to?

Or am I completly wrong in my thought pattern?

I’ve run some tests and here are my results, is this expected?#

test_location = "/home/lewis/leaf/test/"


new_trans_test = transforms.Compose([((image_width_start,image_height_start)),
                                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

#load dataset

test = torchvision.datasets.ImageFolder(root=test_location,transform=new_trans_test)

#get data loader

test_loader =, batch_size=batch_size, shuffle=True)

val_total_correct = 0
val_total_checked = 0 

for i, data in enumerate(test_loader, 0): 
    #check batch with no gradients
    with torch.no_grad():
        inputs, labels = data
        inputs =
        labels =

        #get outputs
        outputs = net(inputs)
        #calculate loss
        #calculate accuracy
        val_total_correct += torch.sum(torch.argmax(outputs,1) == labels).item()
        val_total_checked += len(labels)

    print(f"\rAccuracy {round(val_total_correct/val_total_checked*100,2)}%  {i+1}/{len(test_loader)}     ", end="",flush=True)

I’ve solved it I think.

I’m not calling model.eval()