Model outputs different results with input of different batch_size

I use a pretrained ResNet-50 to classify ImageNet images. I define a normalization model to avoid directly normalizing the images. The models are defined as:

class Normalization(nn.Module):
    def __init__(self, mean, std):
        super(Normalization, self).__init__()
        self.register_buffer('mean', torch.tensor(mean).view(3, 1, 1))
        self.register_buffer('std', torch.tensor(std).view(3, 1, 1))

    def forward(self, x):
        return (x - self.mean) / self.std

class Classifier(nn.Module):
    def __init__(self, clf):
        super(Classifier, self).__init__()
        self.norm = Normalization(IMAGENET_MEAN, IMAGENET_STD)
        self.clf = clf
        for param in self.clf.parameters():
            param.requires_grad = False

    def forward(self, x):
        x = self.norm(x)
        x = self.clf(x)
        return x

model = models.resnet50(pretrained=True)
clf = Classifier(model)

Then, I feed the images into the classifier and save the output logits by converting them to numpy format. However, when I use the same classifier to classify the images again in another script, I find that the outputs different from those saved. The difference is around 1e-05, which seems not a precision problem. After long time of debugging I find that this happens when I use DataLoader with different batch_size from the first time. Since I set the model to evaluation mode, this is unreasonable. What would the problem be?

BTW, the dataloaders are defined as:

train_set = datasets.ImageFolder(os.path.join('/DATA5_DB8/data/yfli/datasets/ImageNet_val_selected_Res18&50Dense121/', 'val/'), transform=transform)
train_loader = DataLoader(train_set, batch_size=20, shuffle=False, num_workers=0)
logit_set = datasets.DatasetFolder(os.path.join('/DATA5_DB8/data/yfli/datasets/tmp/', 'val/'), np.load, ['.npy'], transform=torch.from_numpy)
logit_loader = DataLoader(logit_set, batch_size=20, shuffle=False, num_workers=0)