Why is the initialization of the conv layer in terms of products and not sum?

Brando_Miranda · July 7, 2018, 1:11am

for the convolution (https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/conv.py) the implementation is:

def reset_parameters(self):
    n = self.in_channels
    for k in self.kernel_size:
        n *= k
    stdv = 1. / math.sqrt(n)
    self.weight.data.uniform_(-stdv, stdv)
    if self.bias is not None:
        self.bias.data.uniform_(-stdv, stdv)

which I take is the n=nb_chan*k_1*k_2. However, why isn’t it n=nb_chan+k_1+k_2? What is wrong with the sum?

My question is based on the fact that the linear seems to be the total of “in units”:

def __init__(self, in_features, out_features, bias=True):
    super(Linear, self).__init__()
    self.in_features = in_features
    self.out_features = out_features
    self.weight = Parameter(torch.Tensor(out_features, in_features))
    if bias:
        self.bias = Parameter(torch.Tensor(out_features))
    else:
        self.register_parameter('bias', None)
    self.reset_parameters()

but my notion of “total” seems to be captured better by sums than by products…

SimonW · July 7, 2018, 6:53am

because product is the number of input values that contribute to an output value. sum isn’t.

Brando_Miranda · July 7, 2018, 1:47pm

is the way pytorch initializing He init or a variant? Is there some paper one can cite or read on it? I just can’t recognize what init type its using. Maybe its something that doesn’t appear on a paper?