I’m using fasttext with a machine translation task.
The challenge I have is how to apply fasttext embedding with a small window less than the default value of .
url = ‘https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ar.vec ’
SRC.build_vocab(train_data, vectors=Vectors(‘wiki.ar.vec’, url=url), unk_init = torch.Tensor.normal_, min_freq = 2)
I have found the below code but I’m not sure about how to build vocab using
from gensim.models import FastText
model_ted = FastText(sentences_ted, size=300, window=5, min_count=5, workers=4,sg=1)
Can you share the pseudo code in complete with proper formatting?
Also NB, if a model is pre-trained and you are going to use it, then we have to use those defaults with which the model was trained very likely.
Plus if you are looking to train your own, then here’s a good starter!
Or is it that you want to only build the vocab with torchText?
Thank you @ecdrid for your valuable comment.
My model is based on CNN seq2seq and I need to use fasttext embedding with a small window. Honestly, I’m not sure about the optimal way to do that.
I have read the article you cited, I think is better to use fasttext embedding for words representation as a separate feature.