Why multiply sqrt(d_model) before TransformerEncoderLayer?

Hi,

Thank you so much for providing the tutorial! I notice that in https://github.com/pytorch/tutorials/blob/main/beginner_source/transformer_tutorial.py#L92 , you multiply sqrt(d_model) before TransformerEncoderLayer. May I ask why we need to do this?

Thanks!