Why does nn.Transformer enforce the input embedding size to be equal to transformed embedding size?

Let src be the input to the decoder. src.size() = (seq_len, embed_dim_1). Then the linearly transformation yields Q= src*W_query has new dimensions of (seq_len, embed_dim_2)

However it appears that the Pytorch implementation of the Transformer enforces embed_dim_1 = embed_dim_2.

  1. Why is this?
  2. Is this the standard variation of transformers used in Large language models like BERT, GPT?
  3. Is there a way for me to allow the dimensions to be different, or do I need to build my own transformer?