Evaluate Twice, Accuracy Changes if I Shuffle

4gatepylon · May 6, 2022, 7:30pm

I have a sanity test for my model when I need to use it for evaluation. I do not understand why the accuracy changes every time I evaluate. It seems to change on some sort of period if I just keep evaluating, so I’m guessing the shuffling order will change periodically in some way (maybe the index the Dataloader is starting at changes or something like that).

My models are ResNet10 on CIFAR-10.

I’ve tried converting the tensors to doubles in the evaluate function (as shown below) to fix numerical issues but to no avail.

I’m fixing all other sources of randomness I know. Why would this happen? Why would shuffling the data change the accuracy like that?

def fix_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

  acc = evaluate(model, test_loader)
  print("Model 1 Original Accuracy: {}".format(acc))
  assert acc == evaluate(model, test_loader)

and

def evaluate(model, test_loader):
    model.eval()

    # TODO, why would the total_correct / total_num change as we shuffled the data differently?
    for _, (images, labels) in enumerate(test_loader):
        total_correct, total_num = 0., 0.

        with torch.no_grad():
            labels = labels.cuda().double()
            img = images.cuda()
            h = model(img)
            preds = h.argmax(dim=1).double()
            total_correct = (preds == labels).sum().cpu().item()
            total_num += h.shape[0]

    return total_correct / total_num

Andrei_Cristea · May 6, 2022, 7:39pm

Hello! Just a quick note / question. Below:

How do you define acc? This snippet shows where you define acc2 but not acc. And you’re comparing acc rather than acc2 with the re-evaluated model, so just wanted to confirm that this isn’t somehow causing the issue.

Also, antother quick idea is: have you tried also adding this to your list of deterministic instructions:
torch.use_deterministic_algorithms(True)

Per the documentation, that does more than just torch.backends.cudnn.deterministic = True even though they sound similar.

4gatepylon · May 6, 2022, 7:56pm

Ok thanks for the tip with the deterministic algorithms! Regarding acc, I actually have two models so I modified the snippet but made a mistake. They are all acc.

4gatepylon · May 6, 2022, 8:09pm

Resolved! If you look at the code inside evaluate you see that the total_number and total_correct are initialized in the wrong location. This means that every run it returns the value for a different batch (which is probably why it’s cyclic: it starts at different batches in some sort of cycle). The accuracies we were seeing were inside the batch.