I have a dataset which consists in pairs of embeddings and strings (sentences). The goal is to learn to map them, so after that I could be able to generate the string from a given embedding, and viceversa. How should I approach this? Any code reference?
If there is a lookup table for a string to embedding generation and it is a bijective dict (every string maps to one embedding and vice versa), it is possible to just store them.
Short of that, you are trying to learn the inverse of a fn. Imagine the simplest case: the sentence embedding is a real vector and is generated by taking an average of token embeddings. So in the reverse task, given a sentence embedding S, you would like to find the tokens, whose embeddings, when averaged, would produce S. It seems really difficult unless the token embeddings have some specific properties.
This could be a multivariate regression problem, but I can’t imagine the usual constraints for that problem would be satisfied here.