Hi,
Thank you so much for providing the tutorial! I notice that in https://github.com/pytorch/tutorials/blob/main/beginner_source/transformer_tutorial.py#L92 , you multiply sqrt(d_model) before TransformerEncoderLayer. May I ask why we need to do this?
Thanks!