Why the default negative_slope for kaiming_uniform initialization of Convolution and Linear layers is √5?

I noticed that the default initialization method for Conv and Linear layers in Pytorch is Kaimiing_uniform.

I just don’t understand why the default value of negative_slope(the default act is leaky_relu) is √5.

Is it written just for simplicity or for some specific reason?

def reset_parameters(self):
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
    if self.bias is not None:
        fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / math.sqrt(fan_in)
        init.uniform_(self.bias, -bound, bound)

My understanding (the last update is here) is that this is mainly from casting the init as it has always been done to the kaiming_uniform_ initialization scheme. I would not read it as a recommendation for that particular activation function.

As such, for me there are two takeways:

  • This is a scheme that has been used for quite some time as the default,
  • if you believe that (Kaiming) He’s initialization scheme is a good thing, including the parametrization, and your activation doesn’t happen to be leaky_relu with slope √5, you might consider overriding the initialization with a kaiming_init_ aligned with your activation function.

Best regards

Thomas

Acknowledgement: Thanks to Vishwak and @fmassa who were involved in the effort of documenting and streamlining how the initializations are coded, for their response when I asked about this, any errors and bad ideas above are my own.

Thanks for you reply.

this is mainly from casting the init as it has always been done to the kaiming_uniform_ initialization scheme.

This actually solves my confusion.

The default values of the implementation are confusing

https://pytorch.org/docs/stable/modules/torch/nn/init.html#kaiming_uniform
def kaiming_uniform_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’)
Args:
tensor: an n-dimensional torch.Tensor
a: the negative slope of the rectifier used after this layer (0 for ReLU
by default)

https://pytorch.org/docs/stable/nn.html#leakyrelu
LeakyReLU( negative_slope=0.01 , inplace=False* )

a=0 for Relu, but nonlinearity=‘leaky_relu’
the default value of negative_slope is 0.01

Yep, I have found it confusing also. If gain is what is only affected, then only parameter a should be there.