Why is the initialization of the conv layer in terms of products and not sum?

for the convolution (https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/conv.py) the implementation is:

def reset_parameters(self):
    n = self.in_channels
    for k in self.kernel_size:
        n *= k
    stdv = 1. / math.sqrt(n)
    self.weight.data.uniform_(-stdv, stdv)
    if self.bias is not None:
        self.bias.data.uniform_(-stdv, stdv)

which I take is the n=nb_chan*k_1*k_2. However, why isn’t it n=nb_chan+k_1+k_2? What is wrong with the sum?


My question is based on the fact that the linear seems to be the total of “in units”:

def __init__(self, in_features, out_features, bias=True):
    super(Linear, self).__init__()
    self.in_features = in_features
    self.out_features = out_features
    self.weight = Parameter(torch.Tensor(out_features, in_features))
    if bias:
        self.bias = Parameter(torch.Tensor(out_features))
    else:
        self.register_parameter('bias', None)
    self.reset_parameters()

but my notion of “total” seems to be captured better by sums than by products…

because product is the number of input values that contribute to an output value. sum isn’t.

is the way pytorch initializing He init or a variant? Is there some paper one can cite or read on it? I just can’t recognize what init type its using. Maybe its something that doesn’t appear on a paper?