Combine several features

Hi,

How can I use several features in my net, e.g. current token, next token, current pos-tag, next pos-tag?

Do I need to embed the pos tags? Or is just their index is ok? I’m very confused about how to make this work.

Thanks!

You may need two embedding matrices, one for token, one for pos tags. And the network is like this:
input token index -> token embedding matrices -> token embedding
input pos-tag index -> pos-tag embedding matrices -> pos-tag embedding
and then concatenate token embedding and its corresponding pos-tag embedding

1 Like

Thanks, I’ll try that!

Do you think you can give me a small example of how to do this? I’m getting myself into a mess :frowning:

Here is an example:

import torch

# prepare embeddings
VOCAB_SIZE = 100
POS_TAG_SIZE = 20
WORD_EMBED_DIM = 5
POS_TAG_DIM = 3
word_embeddings = torch.nn.Embedding(VOCAB_SIZE, WORD_EMBED_DIM)
pos_tag_embeddings = torch.nn.Embedding(POS_TAG_SIZE, POS_TAG_DIM)

# prepare training data (here I use some random generated LongTensors)
NUM_TRAIN_EXAMPLES = 1000
MAX_SEQ_LEN = 10
word_sequence = torch.randint(VOCAB_SIZE, (NUM_TRAIN_EXAMPLES, MAX_SEQ_LEN), dtype=torch.long)
pos_tag_sequence = torch.randint(POS_TAG_SIZE, (NUM_TRAIN_EXAMPLES, MAX_SEQ_LEN), dtype=torch.long)

# mock training of a batch
BATCH_SIZE = 8
word_batch = word_sequence[0 : 0 + BATCH_SIZE]
pos_tag_batch = pos_tag_sequence[0 : 0 + BATCH_SIZE]

word_embed_output = word_embeddings(word_batch)
pos_tag_embed_output = pos_tag_embeddings(pos_tag_batch)

print('word_embed_output shape: {}'.format(word_embed_output.size()))
print('pos_tag_embed_outputshape :{}'.format(pos_tag_embed_output.size()))

# concatenate word_embed_output and pos_tag_embed_outputshape
final_embed = torch.cat((word_embed_output, pos_tag_embed_output), dim=-1)
print('final embed shape: {}'.format(final_embed.size()))

If you have some pretrained embeddings, you can wrap them in tensors and load them into embedding matrices using torch.nn.Embedding.from_pretrained().
Below is an example from the offical document https://pytorch.org/docs/stable/nn.html

>>> # FloatTensor containing pretrained weights
>>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
>>> embedding = nn.Embedding.from_pretrained(weight)
>>> # Get embeddings for index 1
>>> input = torch.LongTensor([1])
>>> embedding(input)
tensor([[ 4.0000,  5.1000,  6.3000]])
1 Like

Thank you so much!!!