Currently for batch norm:
- Batch mean and biased batch variance is used during training for renormalization.
- Why biased batch variance and not unbiased batch variance?
- Why not running mean/var? I would guess the running mean/var are better, esp if the mini batches are small?
- Batch mean and unbiased batch variance is used during training to update running mean/var.
- Why unbiased variance here instead of biased variance?
- Batch variance is estimated using the batch mean.
- Why using the batch mean and not the running mean? The running mean should be a better estimate of the mean than the batch mean?
- If using the running mean, is this always unbiased?
- Why using the batch mean and not the running mean? The running mean should be a better estimate of the mean than the batch mean?
I found some related discussions (but still don’t fully get the conclusions from those, i.e. what would be the answers to my questions above):
- BatchNorm should use Bessel's correction consistently · Issue #1410 · pytorch/pytorch · GitHub
- Batch norm doc mentions using biased variance estimator, but it actually uses the unbiased variance estimator · Issue #77427 · pytorch/pytorch · GitHub
- deep learning - Batch normalization variance calculation - Data Science Stack Exchange
- deep learning - Batch normalization variance calculation - Cross Validated