How does torchtext treat unseen token?

Jeffrey · July 25, 2019, 2:40am

Say I am using pretrained GloVe model:

TEXT.build_vocab(train, vectors=GloVe(name='6B', dim=100))

If there is a token in the test dataset that is not in the train dataset, but the token is actually in GloVe(name=‘6B’, dim=100). Torchtext will return 0 as index, instead of the actual index in GloVe(name=‘6B’, dim=100).

Is this a bug or do I miss anything?