I don’t think that is the case, as I mentioned above it calculates the standard deviation of the weights considering the next non-linearity which has to be applied, It doesn’t apply relu to them.
You should check the implementation in pytorch, I have shared the link above.
Take this example,
w = torch.empty(3, 5)
nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')
>>tensor([[-0.4137, 0.3216, 0.0705, -0.4403, 0.4050],
[-0.5409, 0.3364, -0.7153, 0.2617, 0.5652],
[ 0.2512, 0.6643, -0.9265, -0.2095, -0.9202]])
If you look at the above output, you’ll see a lot of negative’s as weights, which if the ReLU was applied will not be the case.