Fasttext embedding with small window size

Aiman_Mutasem-bellh · September 15, 2020, 4:22am

Dear all,

I’m using fasttext with a machine translation task.

The challenge I have is how to apply fasttext embedding with a small window less than the default value of [5].

url = ‘https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ar.vec ’

SRC.build_vocab(train_data, vectors=Vectors(‘wiki.ar.vec’, url=url), unk_init = torch.Tensor.normal_, min_freq = 2)

I have found the below code but I’m not sure about how to build vocab using torchtext.data Field

from gensim.models import FastText
model_ted = FastText(sentences_ted, size=300, window=5, min_count=5, workers=4,sg=1)

Any suggestions?

Regards,

ecdrid · September 17, 2020, 4:01pm

Can you share the pseudo code in complete with proper formatting?
Also NB, if a model is pre-trained and you are going to use it, then we have to use those defaults with which the model was trained very likely.
Plus if you are looking to train your own, then here’s a good starter!

Or is it that you want to only build the vocab with torchText?

Ty!

Aiman_Mutasem-bellh · September 18, 2020, 12:00am

Thank you @ecdrid for your valuable comment.

My model is based on CNN seq2seq and I need to use fasttext embedding with a small window. Honestly, I’m not sure about the optimal way to do that.

I have read the article you cited, I think is better to use fasttext embedding for words representation as a separate feature.

Kind regard,