I am trying to use nn.Transformer. I am confused if the d_model parameter is the embedding dimension of the input, or if it’s the embedding dimension of the linearly transformed input
I.E., Let X = input where X.size() = (seq_len, embed_dim_1). Then the linearly transformed input = X*W_query with new dimensions of (seq_len, embed__dim_2)
I first treated d_model as the input embedding but then I got an error about how d_model must be divisible by num_heads which doesn’t make sense. The input embedding * W needs to be divisible by num_heads, not the input embedding itself.