Why I only change "shuffle=False" to "shuffle=True" in my DataLoader when I test my model, my test result will be different so much?

khtao · May 14, 2018, 1:49am

I use pytorch 0.4 and python 3.6
my test code is:

def test_model(dataloader, model, test_num=3000):
    device = torch.device("cuda" if opt.use_cuda else "cpu")
    total = 0
    correct_pos = 0
    correct_num = 0
    predicted_pos = 0
    pos_num = 0
    with torch.no_grad():
        for ii, (img, label) in tqdm(enumerate(dataloader)):
            label = label.view(len(label)).numpy()
            img = img.to(device)
            output = model(img)  # 1*3*299*299
            _, predicted = torch.max(output.data, dim=1)
            predicted = predicted.cpu().numpy()
            total += len(label)
            predicted_pos += predicted.sum()
            correct = predicted == label
            correct_num += correct.sum()
            correct_pos += np.logical_and(label, correct).sum()
            pos_num += label.sum()
            if ii > test_num:
                break
    recall = correct_pos / pos_num
    precision = correct_pos / predicted_pos
    accuracy = correct_num / total
    neg_precision = (correct_num - correct_pos) / (total - pos_num)
    f_num = 2 * recall * precision / (recall + precision)
    # print('Accuracy of the network :' + str(accuracy))
    # print('Recall of the network :' + str(recall))
    # print('Precision of the network :' + str(precision))
    # print('Neg_precision of the network :' + str(neg_precision))
    return {"Recall": recall,
            "Accuracy": accuracy,
            "Precision": precision,
            "Neg_precision": neg_precision,
            "F": f_num}

when shuffle=True

{‘Recall’: 0.7745604963805585, ‘Accuracy’: 0.9315490043961727, ‘Precision’: 0.9413095387708935, ‘Neg_precision’: 0.9838965517241379, ‘F’: 0.8498326431043286}

when shuffle=False

{‘Recall’: 0.19451913133402274, ‘Accuracy’: 0.639384535815878, ‘Precision’: 0.23404255319148937, ‘Neg_precision’: 0.7877241379310345, ‘F’: 0.2124583498051618}

lili.ece.gwu · May 14, 2018, 1:59am

You probably don’t want to shuffle your test data loader … it is even safer to test the samples one by one, I think.

khtao · May 14, 2018, 2:06am

but I want know what cause the differences. Do you know?

SimonW · May 14, 2018, 5:36am

It’s probably because that you don’t enumerate over all test data

khtao · May 15, 2018, 1:11am

I should use model.eval()，before I test it

Lee_Daron · December 18, 2021, 4:21am

but why model.eval influence that ？

Vidit_Agarwal · December 18, 2021, 11:30pm

By using shuffle=True you shuffle up the dataset which makes the training batches more generalized which in turn makes your model more generalized thus you model performs much better on the unseen data