To do Embedding lookup, we need to pass in the indices.
Suppose I have a list of ids, like [1, 10001, 101, ...]
, they range from 1
to 10001
, but not guaranteed to be dense and continuous, which means this list of ids might only contain just a few unique values, say only [1, 10001]
.
Now I want to create an Embedding
layer for these ids, I could simply say num_embeddings = 100001
, but this might waste a lot memory, since there might be only 2
unique ids at all.
My question is, to be more memory efficient, should I convert these sparse ids to dense ones before creating the embedding, or Embedding
could already handle this issue?