I think the former. But you can also prune these matrices with fastText. But having said that…
It’d be way easier for you to create a small dummy set, train a fastText model, then test it out yourself. I’ve used fastText only a handful of times, so I wouldn’t really trust my own answers.
I am almost certain that fastText does not create a “not found” token. You could do that yourself because creating an
num_emb + 1 then copy the weight matrix from fastText over the appropriately.
input_matrix = fasttext_model.get_input_matrix() # numpy
num_emb, emb_dim = input_matrix.shape
self.embbag = nn.EmbeddingBag(num_emb + 1, emb_dim)
I haven’t tried it but something like that might work.