Hi,
I trained and saved a model that has two embedding layers.
One had a vocab V of ~ 22k items, that were encoded using a latent dimension D of 8.
If this is my trained embedding matrix (V x D)
tensor([[ 0.6163, 0.9769, 0.3950, ..., 1.2966, 0.3279, 1.4990],
[-0.4985, -0.6366, 0.6025, ..., 0.1584, -0.9755, -0.7621],
[-0.7265, 0.0665, -1.8310, ..., -0.4401, 0.8690, -0.7261],
...,
[-0.5039, 0.8430, -0.7346, ..., -0.1686, -0.0024, -0.9600],
[ 0.9195, 0.3476, 0.0367, ..., -0.9595, 0.1659, 1.1200],
[ 1.1634, -0.1817, -1.1437, ..., -1.2055, -0.5795, -1.6404]])
is there a way to map each of the rows to its original input (which is an integer)?
I am thinking that the embedding matrix is one-to-one to the initial size of the vocab, but I don’t know how are they mapped because the num_embeddings
param asks for the length of the vocab only. So if the encoding is done with something like a dictionary, then what is the first key? Is it 0 or 1, and if I would plug into that dictionary the key 0 would I get the first row of the embedding matrix?