Thanks for your answer. I think there is a misunderstanding from my side: I expected the module (nn.MultiheadAttention) to embed q, k, v values inside the module according to kdim, vdim and ideally qdim=kdim. The way the module is implemented, unfortunately it is not possible to use different embedding dimensions for the attention operation.
In my opinion it would be much more useful to be able to perform attention in individual dimensions for kdim and vdim. Is there a function or module that only performs attention without internal embedding stage?