What distribution does Module's parameter subject to default?

For example,

layer = nn.Linear(10, 20)
weight = layer.weight
print(weight)

What distribution does weight subject to?
Is that normal distribution?

Let’s check the source code.

def __init__(self, in_features, out_features, bias=True):
    ...
    self.weight = Parameter(torch.Tensor(out_features, in_features))

def reset_parameters(self):
    stdv = 1. / math.sqrt(self.weight.size(1))
    self.weight.data.uniform_(-stdv, stdv)
    ...

So the answer is: a uniform distribution of values between -1/sqrt(in_features) and 1/sqrt(in_features)

and that particular initialization scheme is from LeCun’98 Efficient Backprop

1 Like