maybe it create a representation for each word, because computer do not understand word, computer understand numbers, so when we do,
x = nn.EmbeddingBag(num_embeddings=10, embedding_dim=3, mode='sum')
list(x.parameters())
it give,
[Parameter containing:
tensor([[ 0.8823, -0.3787, 0.8360],
[-1.4388, -0.6124, -1.6967],
[ 0.4632, 0.6406, 0.1272],
[-0.8657, -2.0807, -0.9140],
[-0.3749, -0.5471, -0.5424],
[ 0.9730, 0.5713, 0.4584],
[-1.3402, 0.1033, -1.4363],
[-0.1600, -0.3686, -0.2954],
[ 1.1288, -0.1282, -1.0070],
[ 0.8220, -0.0371, -0.7206]], requires_grad=True)]
this means that our vocabulary has 10 words, and each of those words are represented by an array of 3 floating point numbers, so, for a computer, a word would mean these 3 floating point numbers.
so if our vocabulary had 10 words like
apple orange banana grape juice fruit pineapple strawberry mango watermelon
then for computer,
apple
would mean
[ 0.8823, -0.3787, 0.8360]
and
orange
would mean
[-1.4388, -0.6124, -1.6967],
if we want to change the representation of any of these words, then update their embedding.
plus embeddingbag would give us a sum
(or we could even get mean
) of these embeddings, that is, if we do,
x(torch.LongTensor([[0, 1]]))
then we will get,
tensor([[-0.5565, -0.9911, -0.8608]], grad_fn=<EmbeddingBagBackward>)
that is sum of first two arrays