Sanity Check Network Initialization

When setting up new models it can be helpful to have some sanity checks. One component of a model I find particularly hard to check is the initialization.

As I recall from Xavier and Kaiming initialization, the common approaches to weight initialization aims at conserving the variance from a layers inputs to outs.

That is, if we provide at standard normal (mean=0, variance=1) to some layer, we want it’s outputs to become a standard normal, too (mean=0, variance=1). This nicely aligns with the practice of standard normalization of the inputs and targets.

I therefore propose the following naive sanity check for the initialized network:
( a ) create a large batch of standard normal data
( b ) pass the data through the model
( c ) check if the outputs mean and variance (over batch dimension) resembles a standard normal

Performing this check with a batch of 500 standard normals with the torchvision.models.resnet50(pretrained=False) however yields variances of 0.0044 to 0.0084.

I trust the official torchvision model and believe it is an excellent and working model.
However how would you explain the reduced variance?

import torchvision.models as models
net = models.resnet.resnet18(pretrained=False)
y = net(torch.randn(n_samples, 3, 224, 224))
print(y.mean(dim=0), y.var(dim=0))
print(y.mean(dim=0).min(), y.mean(dim=0).max(), y.var(dim=0).min(), y.var(dim=0).max())