In my current project, I introduce auxiliary embeddings to discriminate functional word (e.g., Conjunctions, Determiners, Pronouns and so on).and non-functional word. Specifically, I set auxiliary embeddings of functional words as zero vectors and randomly initialize those of non-functional words. My goal is to fine-tune the latter during training and keep the former unchanged (i.e., always zeros).

Is it feasible for nn.Embedding layer?


Two ways:

  1. This could probably be done by introducing an additional masking embedding with n_dims=1 and have
    the embedding of a particular word ‘1.0’ if it should use aux embeddings and ‘0.0’ if not.
    Set the require_grads to False and do en element wise multiplication between the
    aux embedding and the mask embedding.

  2. Calculate a mask (Boolen tensor) that determine if one should use the aux embedding for each particular word
    and use it to mask the input before embedding.

you can keep one separate Embedding layer for functional words, and one separate Embedding layer for non-functional words. Seems like a cleaner solution no?

