Initialization gain for SiLU activation

As the current function torch.nn.init — PyTorch 1.13 documentation does not give the gain for the SiLU activation SiLU — PyTorch 1.13 documentation , do you know what value I should use ?

I guess it boils down to how much and what kind of work you want to put into it.
Some ideas:

  • “if it is close to ReLU” use ReLU gain,
  • check the literatur (links given in the SiLU documentation),
  • experiment yourself.

For the last one, maybe the discussion we had regarding SELU might be some inspiration:

Of course, we (I) got scolded by the people who prefer computing the gain from the fixed point calculation, but if your goal is to train networks successfully, that might not be as crucial.

Best regards

Thomas