Dear all,

I have several word embeddings and I want to take weighed sum of them and use the result as regular one. One thing is I want to train those weights when each word has its own weight.

To do that I define weight matrix and embedding variable.

self.embeds_stacked = torch.tensor(np.stack(embeddings, axis=2), dtype=torch.float).cuda()

self.embed_coeffs = torch.tensor(np.ones(shape=(max_features, len(embeddings)))/len(embeddings),

dtype=torch.float, requires_grad=True).cuda()

self.embed_coeffs = nn.Parameter(self.embed_coeffs, requires_grad=True)self.embedding = nn.Embedding(max_features, embed_size)

self.embedding.weight.requires_grad = False

Where “embeddings” is list of embedding matrices, max_features is number of words (all embeddings have same dimensionality). embeds_stack has (max_features, embedding_size, K) dimensionality,

During forward step I just use einsum to get weighted sum along axes and assign weighted values to embedding variable. After that I use it as regular embedding.

embeds_weighed = torch.einsum(‘ijk,ik->ij’, self.embeds_stacked, self.embed_coeffs)

self.embedding.weight.data = embeds_weighed

h_embedding = self.embedding(x)

However, embedding coefficients stay same across training.

Is there any option to fix that?

Thanks in advance!