What is batch normalization doing?

I’m curious about what is batch norm actually doing? It seems like it is taking the normalization of all data dimension except the batch dimension, but why does the channel number matters?


The original batchnorm paper: https://arxiv.org/abs/1502.03167 should have all the informations you need :slight_smile: