Add additional data such as word features to word embedding

#1

Hi everyone, I’m using pretrain word embedding for nmt task. I have idea about using word features such as named entity to improve nmt quality. Is it possible to concate one hot vector (named entity) to my pretrain word embedding and use it for my nmt model ?
For example:
Given sentence: My name is James .
NE annotated sentence: My|O name|O is|O James|PERSON
With James|PERSON, i will concate one-hot vector of PERSON tag (e.g [1,0,0,0]) to word embedding vector of “James” (e.g [4,5,6]). So result is [4,5,6,1,0,0,0]

(Chris) #2

Sure, you can concatenate vectors. For example, say you have

  • embed with embed.shape = (batch_size, seq_len, embed_dim)
  • custom with custom.shape = (batch_size, seq_len, custom_dim)

You can do:

X = torch.cat([embed, custom], 2)

Then X.shape = (batch_size, seq_len, embed_dim+custom_dim)

1 Like
(Arunav Shandilya) #3

yes you can do that,
just add extras embedding of suitable dimension,
embedding = nn.Embedding(vocab_size, dim),
embedding.shape = (batch_size, seq_len, dims)
extended_dim = (batch_size, seq_len, extended_dims)

final = torch.cat([embedding, extended],2)]
the final is the vector with embeddings of name_entity as well. :slight_smile:

1 Like