CNN default initialization understanding

Kaixuan_WANG · June 13, 2020, 5:25pm

Hi, I observed that the default CNN initialization has been changed.
In version 1.0 and above:

 def reset_parameters(self):
        n = self.in_channels
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

However, in 0.4 versions:

    def reset_parameters(self):
        n = self.in_channels
        for k in self.kernel_size:
            n *= k
        stdv = 1. / math.sqrt(n)
        self.weight.data.uniform_(-stdv, stdv)
        if self.bias is not None:
            self.bias.data.uniform_(-stdv, stdv)

What the motivation of these changes and how to understand

init.kaiming_uniform_(self.weight, a=math.sqrt(5))

Why a=sqrt(5)
Thanks for your time!

harsha_g · June 13, 2020, 5:56pm

Probably this should help.

Nikronic · June 13, 2020, 8:42pm

Hi,

I think the motivation has been explained in the link @harsha_g has referenced. But for more clarification, I have also explained how to get the values especially sqrt(5) in this post: Clarity on default initialization in pytorch

But if I want to summarize, both do the same thing but older versions of PyTorch have the convention used in Lua Torch and after a while that some modules in PyTorch have been constructed, they tried to achieve same concept using the new modules. For instance, in your example, both approchs sample from uniform distribution which can be achieved by kaiming_uniform_ to which I have explained in the referenced link.

Bests