Approximate nn.Embedding with Fully Connected Layers

Hi all,

I was wondering if anyone has tried approximating nn.Embedding (say for embedding words) with a set of fully connected layers or something like that?

My situation (in somewhat more details) is as follows:
I have a pre-trained embedding but at inference time, I can only provide floating point tensors as input (not one-hot vectors or integers) to the Embedding, hence the need to approximate it with a neural network.

Also if there can be an approximation, what is a good loss function for training such a proxy network (L2)?

Really appreciate your time/response.

Why is your training architecture different from inference? How do you validate that what you trained on generalises to inference if there are key differences in inputs?


Really appreciate your response.

The intent is to evaluate the effectiveness of this transferability. So say two inputs which are close, should have embeddings which are close, and similarly for dissimilar inputs. I was wondering if someone has already tried something and found something to be very effective.

I think research is done in that direction with the hash function for example MinHash:

The output is still discrete though.

1 Like


So I just realized there’s possibly a simple fix for this. If one can represent the nn.Embedding operation as a matrix product, then this can be taken care of.

In particular, the embedding can be thought of as product between a hot-vector and the matrix of all embeddings. This formulation permits non one-hot vectors as input as well.

An embedding is a LookUp Table, a look up table is a Sparse Matrix Vector multiply :wink:

1 Like