for the convolution (https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/conv.py) the implementation is:
def reset_parameters(self):
n = self.in_channels
for k in self.kernel_size:
n *= k
stdv = 1. / math.sqrt(n)
self.weight.data.uniform_(-stdv, stdv)
if self.bias is not None:
self.bias.data.uniform_(-stdv, stdv)
which I take is the n=nb_chan*k_1*k_2
. However, why isn’t it n=nb_chan+k_1+k_2
? What is wrong with the sum?
My question is based on the fact that the linear seems to be the total of “in units”:
def __init__(self, in_features, out_features, bias=True):
super(Linear, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.weight = Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = Parameter(torch.Tensor(out_features))
else:
self.register_parameter('bias', None)
self.reset_parameters()
but my notion of “total” seems to be captured better by sums than by products…