Batch Norm and weight initialization

Hmrishav_Bandyopadhy · July 24, 2020, 5:39pm

Hi,

From what I have observed particularly from here and here is that weight initialization is something that can help us prevent vanishing and exploding gradients in very deep neural nets by properly setting the initial values of the weights. Initializing weights properly ensures that the output of a layer has a mean of zero and a standard deviation of one.

So my question here is , this can be actually very easily attained by batch normalization, as batch normalization normalizes along the batches–to bring about a mean of zero and a standard deviation of one. So to be sure, layer weight initialization is not particularly needed when we use batch norm?