I have a sanity test for my model when I need to use it for evaluation. I do not understand why the accuracy changes every time I evaluate. It seems to change on some sort of period if I just keep evaluating, so I’m guessing the shuffling order will change periodically in some way (maybe the index the Dataloader is starting at changes or something like that).
My models are ResNet10 on CIFAR-10.
I’ve tried converting the tensors to doubles in the evaluate function (as shown below) to fix numerical issues but to no avail.
I’m fixing all other sources of randomness I know. Why would this happen? Why would shuffling the data change the accuracy like that?
def fix_seed(seed):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
acc = evaluate(model, test_loader)
print("Model 1 Original Accuracy: {}".format(acc))
assert acc == evaluate(model, test_loader)
and
def evaluate(model, test_loader):
model.eval()
# TODO, why would the total_correct / total_num change as we shuffled the data differently?
for _, (images, labels) in enumerate(test_loader):
total_correct, total_num = 0., 0.
with torch.no_grad():
labels = labels.cuda().double()
img = images.cuda()
h = model(img)
preds = h.argmax(dim=1).double()
total_correct = (preds == labels).sum().cpu().item()
total_num += h.shape[0]
return total_correct / total_num