[NOT SOLVED] Training character level embeddings in FastText

dhpollack · January 23, 2019, 5:04pm

Ok, I think I get what you mean. But I think that you should be asking the gensim people how they find out of word vectors. Having said that. I still think that for latin-based languages, it would be almost impossible to find a word that wasn’t composed of any smaller sub-sequences from other words. And to get these subwords, you use the function get_subwords. You’ll be able to get a representation of a word that is OOV, but still contains sub-sequences from other words.