Why the default negative_slope for kaiming_uniform initialization of Convolution and Linear layers is √5?

I noticed that the default initialization method for Conv and Linear layers in Pytorch is Kaimiing_uniform.

I just don’t understand why the default value of negative_slope(the default act is leaky_relu) is √5.

Is it written just for simplicity or for some specific reason?

def reset_parameters(self):
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
    if self.bias is not None:
        fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / math.sqrt(fan_in)
        init.uniform_(self.bias, -bound, bound)

My understanding (the last update is here) is that this is mainly from casting the init as it has always been done to the kaiming_uniform_ initialization scheme. I would not read it as a recommendation for that particular activation function.

As such, for me there are two takeways:

  • This is a scheme that has been used for quite some time as the default,
  • if you believe that (Kaiming) He’s initialization scheme is a good thing, including the parametrization, and your activation doesn’t happen to be leaky_relu with slope √5, you might consider overriding the initialization with a kaiming_init_ aligned with your activation function.

Best regards

Thomas

Acknowledgement: Thanks to Vishwak and @fmassa who were involved in the effort of documenting and streamlining how the initializations are coded, for their response when I asked about this, any errors and bad ideas above are my own.

1 Like

Thanks for you reply.

this is mainly from casting the init as it has always been done to the kaiming_uniform_ initialization scheme.

This actually solves my confusion.

The default values of the implementation are confusing

https://pytorch.org/docs/stable/modules/torch/nn/init.html#kaiming_uniform
def kaiming_uniform_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’)
Args:
tensor: an n-dimensional torch.Tensor
a: the negative slope of the rectifier used after this layer (0 for ReLU
by default)

https://pytorch.org/docs/stable/nn.html#leakyrelu
LeakyReLU( negative_slope=0.01 , inplace=False* )

a=0 for Relu, but nonlinearity=‘leaky_relu’
the default value of negative_slope is 0.01

1 Like

Yep, I have found it confusing also. If gain is what is only affected, then only parameter a should be there.