I have an embedding with limited size (say 5)
self.embedding = torch.nn.Embedding(length,embedding_dim)
I receive input ids like (7, 18, 6, …) as a pytorch tensor. However the embedding for 7 is in the first index of
embedding, for 18 it is in second row, etc.
I want a map from these numbers to 1,2, 3… to access stored value in embedding.
It seems I can’t use a dictionary as follows
prompt_token_ids = [self.id_map[x] for x in prompt_token_ids]
How can I do these mappings for tensors?
you can convert the embedding to a numpy array using:
np_array = self.embedding.weight.detach().cpu().numpy()
Also, you need to create a new tensor with new indices using:
indices = torch.arange( # of elements in the input ids)
That would give you the flexibility to iterate over the whole thing. In the end, you can convert the np_array back to a tensor and then to the embedding.
With some research I reached this solution which already works for my need:
prompt_token_ids = (prompt_token_ids.view(-1,1) == self.input_ids).int().argmax(dim=1)
the code above finds those numbers (stored in
input_ids ) in a series of input tokens and replace them with their indexes in