Batch norm- regularization?

Can batch norm act like regularization?

You might find useful reading carefully the article.

Batch norm enables training with larger learning rates, and this other article argues “…that the larger learning rate increases the implicit regularization of SGD, which improves generalization…”.

I believe you could start reading these two articles and their references. As far as I know, there is still a lot of mystery on why exactly batch norm works so well.

1 Like