What is d_model in nn.Transformer? And Why does it have to be same for encoder and decoder?


I don’t understand what d_model is? What if the vocab for input and target are different sizes? How do you use torch.nn.Transformer if you want to translate a sentence with a vocab size 10 to something with vocab size 8?