BN layer during testing

Hi, I am confused about the BatchNorm layer behaviour during testing:

In previous answer (The behavior of the BN layer in train and eval mode), it is mentioned that if I set model.eval(), the running stats (mean, std) will be used to do normalization.
What is the running stats? Are those values fixed from the trained model? Or from the new test data?

For example, during test, I set model to model.eval(), then iterate through each test data and save the predicted value. (Example code below)
Is this the correct way?

In this case, the running stats (mean, std) are changed after every test sample? Would this cause any problem?

Thank you!

import torch

def main():
    # Read trained model
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model = MyModel()
    model = torch.nn.DataParallel(model)
    model.load_state_dict(torch.load(train_model_path, map_location=device))

    # Setting to eval mode

    # Read data
    test_data = CustomDataset(image_dirs, test_csv)
    print('Number of test samples', len(test_data))

    preds = []
    for image_input, label in test_data:
        # Evaluation
        with torch.no_grad():
            output = model(image_input)
        # Prediction
        _, pred = torch.max(output, 1)

        # Compute some scores
        outputs_sm = F.softmax(output, dim=1)
        pred_score = outputs_sm[:, 1].item()

        pred_dx = pred.item()

        print(img_id, pred_score)

if __name__ == "__main__":

No, the running stats are only updated during training as described in the linked post. During testing/evaluation (i.e. after calling .eval() on the batchnorm layer) the running stats will only be used to normalize the input activations. No updates are performed anymore.

Thanks for the quick response!

The running stats that you’re referring to is the running stats of the final batch of training data, is that correct?

No, the running stats are stored as the internal running_mean and running_var and updated during training (i.e. after calling .train() on the batchnorm layer which is the default mode after initialization) in each forward pass using the specified momentum as described in the docs:

This momentum argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is x^new​=(1−momentum)×x^+momentum×xt​, where x^ is the estimated statistic and xt​ is the new observed value.

I understand now! Thank you so much!

1 Like