I encounter the case that the input to some batch norm layer are the same across a batch, and this makes the running stats too small and training not stable. I wonder if it is possible to implement with functions in pytorch using a different input to estimate the running stats. The function of batch norm seems to be implemented in C (pytorch/functional.py at master · pytorch/pytorch · GitHub).

Following this discussion (Implementing Batchnorm in Pytorch. Problem with updating self.running_mean and self.running_var - #2 by SeoHyeong), it is implemented as the following:

```
class SepBN2d(nn.BatchNorm2d):
def __init__(self, num_features, eps=1e-5, momentum=0.1,
affine=True, track_running_stats=True):
super(SepBN2d, self).__init__(
num_features, eps, momentum, affine, track_running_stats)
self.multiple_input = True
def forward(self, input):
input, input_for_stats = input
self._check_input_dim(input)
if input_for_stats is None:
input_for_stats = input
self._check_input_dim(input_for_stats)
exponential_average_factor = 0.0
if self.training and self.track_running_stats:
if self.num_batches_tracked is not None:
self.num_batches_tracked += 1
if self.momentum is None:
exponential_average_factor = 1.0 / float(self.num_batches_tracked)
else:
exponential_average_factor = self.momentum
if self.training:
mean = input_for_stats.mean([0, 2, 3])
var = input_for_stats.var([0, 2, 3], unbiased=False)
n = input.numel() / input.size(1)
with torch.no_grad():
self.running_mean = exponential_average_factor * mean + (1 - exponential_average_factor) * self.running_mean
self.running_var = exponential_average_factor * var * n / (n - 1) + (1 - exponential_average_factor) * self.running_var
else:
mean = self.running_mean
var = self.running_var
input = (input - mean[None, :, None, None]) / (torch.sqrt(var[None, :, None, None] + self.eps))
if self.affine:
input = input * self.weight[None, :, None, None] + self.bias[None, :, None, None]
return input
```

Thanks.