How to map input ids to a limited Embedding indexes

Ahmad_Pouramini · December 5, 2021, 4:19pm

I have an embedding with limited size (say 5)

        self.embedding = torch.nn.Embedding(length,embedding_dim)

I receive input ids like (7, 18, 6, …) as a pytorch tensor. However the embedding for 7 is in the first index of embedding, for 18 it is in second row, etc.

I want a map from these numbers to 1,2, 3… to access stored value in embedding.
It seems I can’t use a dictionary as follows

   def forward(self,prompt_token_ids,pids=None):
        prompt_token_ids = [self.id_map[x] for x in prompt_token_ids]
        return self.embedding(prompt_token_ids)

How can I do these mappings for tensors?

Jazz · December 5, 2021, 9:54pm

you can convert the embedding to a numpy array using:

np_array = self.embedding.weight.detach().cpu().numpy()
Also, you need to create a new tensor with new indices using:
indices = torch.arange( # of elements in the input ids)

That would give you the flexibility to iterate over the whole thing. In the end, you can convert the np_array back to a tensor and then to the embedding.

Ahmad_Pouramini · December 6, 2021, 7:43am

@Jazz Thanks,
With some research I reached this solution which already works for my need:

prompt_token_ids = (prompt_token_ids.view(-1,1) == self.input_ids).int().argmax(dim=1)

the code above finds those numbers (stored in input_ids ) in a series of input tokens and replace them with their indexes in input_ids .