Population statistics for batchnorm instead of running average

zmurez · July 2, 2018, 5:12am

I would like to implement batch normalization as originally described in the paper (https://arxiv.org/pdf/1502.03167.pdf). After the network finishes training, they take one more pass through the dataset to estimate the population mean and variance (step 10 in algorithm 2). Pytorch instead uses the common trick of keeping a running average to estimate these statistics while training.

Is there an easy way to implement this? Basically I need to have access to the mini batch mean and variance computed within each batch normalization layer.

Is there code for this already?

Thanks!

SimonW · July 2, 2018, 5:22am

You can

add forward_pre_hooks to each BN layer, which tracks the means and vars
iterate through the data
replace running_mean and running_var with your tracked aggregated ones.

tom · July 2, 2018, 7:37am

I think you might see if track_running_stats in BatchNormXd (torch 0.4 and above) does what you want.
You can probably copypaste the relevant code if you have an older version.

Best regards

Thomas

zmurez · July 2, 2018, 9:10am

Thank you SimonW! I think that will work. Just coding it up now.

Re Tom: setting track_running_stats=False will use the mini batch statistics at test time (as it does at training time). However this is different than estimating the population statistics and using them at test time. Using mini batch statistics at test time has many undesirable properties (including the prediction on an image will depend on other images in the mini batch).

warmspringwinds · November 24, 2018, 9:51pm

If someone still needs this, we wrote up a small script to do this:

github.com

warmspringwinds/pytorch-segmentation-detection/blob/master/pytorch_segmentation_detection/utils/batchnorm.py

import torch
import torch.nn as nn


## A module dedicated to computing the true population statistics
## after the training is done following the original Batch norm paper

# Example of usage:

# Note: you might want to traverse the dataset a couple of times to get
# a better estimate of the population statistics
# Make sure your trainloader has shuffle=True and drop_last=True

# net.apply(adjust_bn_layers_to_compute_populatin_stats)
# for i in range(10): 
#     with torch.no_grad():
#         for batch_idx, (inputs, targets) in enumerate(trainloader):
#             _ = net(inputs.cuda())
# net.apply(restore_original_settings_of_bn_layers)

This file has been truncated. show original