What does batchnorm's output depend on?

Hey man firstly thanks for such detailed explanation, really respect :slight_smile: . Seriously pytorch forums are great :heart: !!

In batchnorm3d we divide by larger number of elements but in numerator as well we have larger number of terms.

But I was simply assuming the variance used by batchnorm as biased variance (that would have divided by num_elements), and my reasoning would have worked

x2_var_biased = ((x2 - x2_mean.view(1, 3, 1, 1))**2).sum(2).sum(2) / (num_elem2 )

print('Expected bn2 running_var after forward pass: ', 
 bn2.running_var * (1 - bn2.momentum) + x2_var_biased * bn2.momentum)
Expected bn2 running_var after forward pass:  tensor([[ 0.9000,  0.9250,  1.0000]])
x3_var_biased = ((x3 - x3_mean.view(1, 3, 1, 1, 1))**2).sum(2).sum(2).sum(2) / (num_elem3)

print('Expected bn3 running_var after forward pass: ', 
 bn3.running_var * (1 - bn3.momentum) + x3_var_biased * bn3.momentum)
Expected bn3 running_var after forward pass:  tensor([[ 0.9000,  0.9250,  1.0000]])

(After I removed subtracting 1 from num_elements in both batchnorm layers)

But with that unbiased estimates, scaling doesn’t work due to that extra 1!

Thanks for pointing that unbiased thing out, I wouldn’t have thought about it!!

_Also is there any way to use batchnorm with biased variances (without bessel correction), or I’ll have to build it customly? Couldnt find in documentation! _
(Starting a new thread this new question and closing this new thread link)

1 Like