I am implementing a NLP problem. When I used one hot vector to represent words, my computer’s RAM was out of memory. As I seem, Pytorch have torch.sparse that can efficiently store and process tensors for which the majority of elements are zeros, but it is unstable now. In addition, I did not see any examples about using torch.sparse with Embedding etc
How I can solve out of memory problem. Thanks very much for your kind help
Generally speaking, you’d represent the words either as word indexes (like eg ‘the’ is 1, ‘to’ is 2, etc… ), or as embeddings.
You can convert word indexes to embeddings by passing a LongTensor containing the indexes (not one-hot, just like eg
[5,3,10,17,12], one integer per word), into an nn.Embedding.
You should never need to fluff the word indices up into actual physical one-hot. Nor do you need to use sparse tensors: nn.Embedding handles this all for you, internally.
Great! I will try it! Thank you so much for your fast answer!