Why the default negative_slope for kaiming_uniform initialization of Convolution and Linear layers is √5?

My understanding (the last update is here) is that this is mainly from casting the init as it has always been done to the kaiming_uniform_ initialization scheme. I would not read it as a recommendation for that particular activation function.

As such, for me there are two takeways:

  • This is a scheme that has been used for quite some time as the default,
  • if you believe that (Kaiming) He’s initialization scheme is a good thing, including the parametrization, and your activation doesn’t happen to be leaky_relu with slope √5, you might consider overriding the initialization with a kaiming_init_ aligned with your activation function.

Best regards

Thomas

Acknowledgement: Thanks to Vishwak and @fmassa who were involved in the effort of documenting and streamlining how the initializations are coded, for their response when I asked about this, any errors and bad ideas above are my own.

1 Like