How should I weight my embeddings before aggregating them?

I’m currently building an implementation of node2vec where each node can have certain metadata attributes, such as the example below:

  • node: x
  • category: abc
  • brand: xyz

To embed a node, I’ve trained embeddings for each feature (i.e., node, category, brand, which all have different dimensions). Then, I do torch.mean, such as the example below:

emb_nodes = []
for i in range(nodes.shape[1]):
    emb_nodes.append(self.embeddings[i](nodes[:, i]))
emb_agg = torch.mean(torch.stack(emb_nodes), axis=0)

However, this doesn’t work in practice and I think I need to weight how these features (i.e., node, category, brand) are aggregated for each node id.

I’ve looked into embeddingBag but it only allows the embedding dimensions for all embeddings to be the same–in my case, they are all varying (cardinality of product is much larger than category and brand).

Anyone have any suggestions and sample code on how I could train such a weighting layer efficiently? Should I put the weights in another embedding layer?