Why do we need to specify non-linearity while initializing the weights of convolutional layers with kaiming He initialization?

I don’t think that is the case, as I mentioned above it calculates the standard deviation of the weights considering the next non-linearity which has to be applied, It doesn’t apply relu to them.
You should check the implementation in pytorch, I have shared the link above.
Take this example,

w = torch.empty(3, 5)
nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')
>>tensor([[-0.4137,  0.3216,  0.0705, -0.4403,  0.4050],
        [-0.5409,  0.3364, -0.7153,  0.2617,  0.5652],
        [ 0.2512,  0.6643, -0.9265, -0.2095, -0.9202]])

If you look at the above output, you’ll see a lot of negative’s as weights, which if the ReLU was applied will not be the case.

1 Like