I am looking to use pre-trained word vectors to start a text classifier. There seem to be several pre-trained sets available including word2vec and my question has two parts:
- are there any word vectors that are more suited to Pytorch than others. I saw FastText mentioned and wondered whether that is a good starting point
- the usual pretrained vector files are very large and containing millions of words, is there a way to manage this, in reality I only need a fairly small fraction of these and don’t want all of my memory being consumed in loading and storing masses of data I don’t need.
Apologies if these are novice questions, I have looked at earlier posts but these don’t seem to answer quite the questions I have raised.
Many thanks in advance
John