I have encountered an odd problem. I have a trained model with many Conv1d layers. When I set the model to eval() and run the test set through, I receive different accuracies depending on the batch_size.
For example, when I run the same test data through my model all at once vs one at a time, my predictions are wildly different.
model.eval()
model.train(False)
batch_size = len(xtest)
batch_guesses = np.array([])
print("BATCH SIZE ", batch_size)
for i in range(0, len(xtest), batch_size):
inputs = torch.Tensor(xtest[i : i + batch_size]).to(device)
output = model(inputs)
prediction = torch.argmax(output, dim=1)
prediction = prediction.detach().cpu().numpy()
batch_guesses = np.append(batch_guesses, prediction)
print("INDIVIDUAL")
batch_size = 1
single_guesses = np.array([])
for i in range(0, len(xtest), batch_size):
inputs = torch.Tensor(xtest[i : i + batch_size]).to(device)
output = model(inputs)
prediction = torch.argmax(output, dim=1)
prediction = prediction.detach().cpu().numpy()
single_guesses = np.append(single_guesses, prediction)
# find % single_guesses and batch_guesses that are the same
print(np.sum(single_guesses == batch_guesses) / len(single_guesses))
# 1.0 with torch.backends.cudnn.enabled = False
# 0.790625 torch.backends.cudnn.enabled = True
However, I found that this was only the case when torch.backends.cudnn.enabled = True. Setting it to false caused single_guesses and batch_guesses to be identical regardless of batch size. I also found that the larger the difference between the batch sizes, the more predictions would be different, so batch size 1 and 64 had 92% similarity in comparison to a batch size of 1 and 2056’s 79% similarity.
Why does cudnn have such a great impact on classification depending on the batch size?