I wanted to see the variance of 1 weight of a CNN layer with input size=1 and output size = 1:
import torch.nn as nn
variance=0
samplesize=10000
for i in range (samplesize):
layer=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=1)
variance+=(layer.weight.item())**2
print(variance/samplesize)
But this seems to converge to 1/3.
In reset parameter method of torch/nn/modules/conv.py
it calls init.kaiming_uniform_(self.weight, a=math.sqrt(5))
, and in kaiming uniform, it sets nonlinearity to ‘leaky_relu’, so after this code:
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation
with torch.no_grad():
return tensor.uniform_(-bound, bound, generator=generator)
I think calculate gain is returning ~ sqrt(2), and fan = 1, so bound=sqrt(6). But the variance of a uniform distribution U(-sqrt(6),sqrt(6)) should be (sqrt(6)^2)/3=6/3=2, right? I’m confused why I’m getting 1/3 for the variance. Did possibly I make any math or code mistakes?
I also notice when reset_parameters
initializes the biases, ultimately it does use a uniform distribution U(-1,1), and the resulting variance matches up with (1^2)/3 =1/3. But I get this result from both the variance on layer.weight and layer.bias, so I wasn’t sure. Thx