As the current function torch.nn.init — PyTorch 1.13 documentation does not give the gain for the SiLU activation SiLU — PyTorch 1.13 documentation , do you know what value I should use ?
I guess it boils down to how much and what kind of work you want to put into it.
- “if it is close to ReLU” use ReLU gain,
- check the literatur (links given in the SiLU documentation),
- experiment yourself.
For the last one, maybe the discussion we had regarding SELU might be some inspiration:
Of course, we (I) got scolded by the people who prefer computing the gain from the fixed point calculation, but if your goal is to train networks successfully, that might not be as crucial.