Loading and Evaluating Model

Are you calling model.eval() on the model before running the evaluation?
This would make sure to use the running stats in batchnorm layers (and not update them anymore) and disable dropout layers.