I wondered this initially too. I think it’s because the beta term in batch norm effectively adds a bias to each channel.
4 Likes
I wondered this initially too. I think it’s because the beta term in batch norm effectively adds a bias to each channel.