Why do wights of batch normalization initialize like this?

I read the paper about batch normalization, but I do not find how does it initialize the weight. So I find the code in PyTorch as below:

nn/modules/batchnorm.py line31,32:

self.weight.data.uniform_()
self.bias.data.zero_()

So why do the wights of batch normalization initialize like this? Is there any theory that this inialization is optimal?

there is no theory around this specifically.

2 Likes