How to initialize the parameters before GELU activation function?

Hi, everyone. I am trying to train a Transformer model with a GELU activation function. I don’t know how to initialize the parameters before GELU. Does it use kaiming_uniform like relu or use xavier_uniform?
Thank you.

3 Likes