Hi, everyone. I am trying to train a Transformer model with a GELU activation function. I don’t know how to initialize the parameters before GELU. Does it use kaiming_uniform like relu or use xavier_uniform?
Thank you.
4 Likes