As far as I know He initialisation was made to preserve the variance between the inputs and outputs. I noticed that the
init.kaiming_normal_() weight initialisation of a
Linear layer was not preserving the variance. I gave the layer a normal dataset with a variance of 1.0, passed it though a rectifier, and it outputted a dataset with a variance of roughly 0.7-0.8. I got similar results using
init.kaiming_uniform_(). Am I missing something?
I ask this because I have been experiencing an exploding gradient when I add multiple layers to my network.
Here is my code:
#A layer class with He initialisation class LinearHe(nn.Linear): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) torch.nn.init.kaiming_normal_(self.weight, a=0.2) #A Linear layer with 1024 input and output features linear1 = LinearHe(1024, 1024) leaky_relu = nn.LeakyReLU(negative_slope=0.2) #A normally distributed dataset of size 1024 with mean=0, std=1 a = torch.normal(torch.zeros(1, 1024), torch.ones(1, 1024)) a >>> tensor([[-1.8505, 0.7651, 1.8227, ..., -0.3863, 0.6085, 0.6416]]) torch.var(a) >>> tensor(0.9165) b = leaky_relu(linear1(a)) torch.var(b) >>> tensor(0.7133, grad_fn=<VarBackward0>) #This is a repeating result, the variance is scaled by a factor of roughly 0.8 torch.mean(b) >>> tensor(0.4346, grad_fn=<MeanBackward0>)