Kaiming He initialisation not preserving variance


As I understand it He initialisation was developed to preserve the variance of layers - when used in conjunction with a ReLU activation function. I noticed that I was getting a vanishing gradient, with batch normalisation I got an exploding gradient. I did some analysis and found that the layers do not preserve the variance. The variance repeatedly shrank by a constant factor (~0.8 for a=0.2) for each layer-activation pair.
Am I missing something? Am I doing something wrong?

I was using a a=0.2 in my activation function, however this problem persisted with a=0, and with init.kaiming_unifrom_().
Below is some code using a normal distribution as an input to showcase the situation (a uniformly distributed input had the same problem):

#A layer class with He initialisation
class LinearHe(nn.Linear):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        torch.nn.init.kaiming_normal_(self.weight, a=0.2)

#A Linear layer with 1024 input and output features
linear1 = LinearHe(1024, 1024) 
leaky_relu = nn.LeakyReLU(negative_slope=0.2)

#A normally distributed dataset of size 1024 with mean=0, std=1
a = torch.normal(torch.zeros(1, 1024), torch.ones(1, 1024))
>>> tensor([[-1.8505,  0.7651,  1.8227,  ..., -0.3863,  0.6085,  0.6416]])
>>> tensor(0.9165)

b = leaky_relu(linear1(a))
>>> tensor(0.7133, grad_fn=<VarBackward0>)
#This is a repeating result, the variance is scaled by a factor of roughly 0.8
>>> tensor(0.4346, grad_fn=<MeanBackward0>)