My understanding (the last update is here) is that this is mainly from casting the init as it has always been done to the kaiming_uniform_ initialization scheme. I would not read it as a recommendation for that particular activation function.
As such, for me there are two takeways:
This is a scheme that has been used for quite some time as the default,
if you believe that (Kaiming) He’s initialization scheme is a good thing, including the parametrization, and your activation doesn’t happen to be leaky_relu with slope √5, you might consider overriding the initialization with a kaiming_init_ aligned with your activation function.
Best regards
Thomas
Acknowledgement: Thanks to Vishwak and @fmassa who were involved in the effort of documenting and streamlining how the initializations are coded, for their response when I asked about this, any errors and bad ideas above are my own.
The default values of the implementation are confusing
https://pytorch.org/docs/stable/modules/torch/nn/init.html#kaiming_uniform
def kaiming_uniform_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’)
Args:
tensor: an n-dimensional torch.Tensor
a: the negative slope of the rectifier used after this layer (0 for ReLU
by default)