Make Embedding more memory efficient on sparse indices

ecolss · June 4, 2018, 2:18pm

To do Embedding lookup, we need to pass in the indices.

Suppose I have a list of ids, like [1, 10001, 101, ...], they range from 1 to 10001, but not guaranteed to be dense and continuous, which means this list of ids might only contain just a few unique values, say only [1, 10001].

Now I want to create an Embedding layer for these ids, I could simply say num_embeddings = 100001, but this might waste a lot memory, since there might be only 2 unique ids at all.

My question is, to be more memory efficient, should I convert these sparse ids to dense ones before creating the embedding, or Embedding could already handle this issue?