Could someone explain the intuition of using trained models of data with a completely different vocabulary?

Something occurred to me as I was trying to apply my trained models on new datasets, when I tokenised the new dataset, some of the words were obviously assigned a different numeric value than the same word in the original dataset that was used for the training/validation. How can the model make predictions when the datasets have different vocabularies and thus different numeric values for the same words? I am guessing it has something to do with models learning embedding layers, is that correct? Does this apply to non-deep learning models like the ones in sklearn (Forests, SVMs, NB, etc)? I’m just trying to understand the intuition for how it works. Thanks!

I’m not sure I understood 100% your question but I think the language word2idx/ embedding mapping should not change if you want to make use of the knowledge contained in your word embeddings.

Having words map to a different word vector space is like initializing the embeddings randomly. Your model will probably converge if you allow the gradients to propagate to the embeddings and it will move the word to a vector space that makes sense for the problem you are solving but you are not taking advantage of any information the word embedding has to offer.