Why the default negative_slope for kaiming_uniform initialization of Convolution and Linear layers is √5?

lihx · November 11, 2018, 9:51am

I noticed that the default initialization method for Conv and Linear layers in Pytorch is Kaimiing_uniform.

I just don’t understand why the default value of negative_slope(the default act is leaky_relu) is √5.

Is it written just for simplicity or for some specific reason?

def reset_parameters(self):
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
    if self.bias is not None:
        fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / math.sqrt(fan_in)
        init.uniform_(self.bias, -bound, bound)

tom · November 12, 2018, 8:52am

My understanding (the last update is here) is that this is mainly from casting the init as it has always been done to the kaiming_uniform_ initialization scheme. I would not read it as a recommendation for that particular activation function.

As such, for me there are two takeways:

This is a scheme that has been used for quite some time as the default,
if you believe that (Kaiming) He’s initialization scheme is a good thing, including the parametrization, and your activation doesn’t happen to be leaky_relu with slope √5, you might consider overriding the initialization with a kaiming_init_ aligned with your activation function.

Best regards

Thomas

Acknowledgement: Thanks to Vishwak and @fmassa who were involved in the effort of documenting and streamlining how the initializations are coded, for their response when I asked about this, any errors and bad ideas above are my own.

lihx · November 14, 2018, 6:43am

Thanks for you reply.

this is mainly from casting the init as it has always been done to the kaiming_uniform_ initialization scheme.

This actually solves my confusion.

liangbright · April 7, 2019, 12:42am

The default values of the implementation are confusing

https://pytorch.org/docs/stable/modules/torch/nn/init.html#kaiming_uniform
def kaiming_uniform_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’)
Args:
tensor: an n-dimensional torch.Tensor
a: the negative slope of the rectifier used after this layer (0 for ReLU
by default)

https://pytorch.org/docs/stable/nn.html#leakyrelu
LeakyReLU( negative_slope=0.01 , inplace=False* )

a=0 for Relu, but nonlinearity=‘leaky_relu’
the default value of negative_slope is 0.01

ado_sar · January 1, 2024, 12:30pm

Yep, I have found it confusing also. If gain is what is only affected, then only parameter a should be there.