For example,
layer = nn.Linear(10, 20)
weight = layer.weight
print(weight)
What distribution does weight subject to?
Is that normal distribution?
For example,
layer = nn.Linear(10, 20)
weight = layer.weight
print(weight)
What distribution does weight subject to?
Is that normal distribution?
Let’s check the source code.
def __init__(self, in_features, out_features, bias=True):
...
self.weight = Parameter(torch.Tensor(out_features, in_features))
def reset_parameters(self):
stdv = 1. / math.sqrt(self.weight.size(1))
self.weight.data.uniform_(-stdv, stdv)
...
So the answer is: a uniform distribution of values between -1/sqrt(in_features)
and 1/sqrt(in_features)