Combine n-gram word embeddings

Hi, I am willing to concatenate char, 2-gram and 3-gram word embeddings.

x_char = torch.randint(low=0, high=26, size=(3, 16), dtype=torch.long)
x_2_gram = torch.randint(low=0, high=60, size=(3, 8), dtype=torch.long)
x_3_gram = torch.randint(low=0, high=100, size=(3, 4), dtype=torch.long)

class NN(nn.Module):
    def __init__(self, one_gram_vocab_size, two_gram_vocab_size, three_gram_vocab_size, embedding_dim):
        super().__init__()
        self.one_gram_embedding = nn.Embedding(one_gram_vocab_size, embedding_dim)
        self.two_gram_embedding = nn.Embedding(two_gram_vocab_size, embedding_dim)
        self.three_gram_embedding = nn.Embedding(three_gram_vocab_size, embedding_dim)
    
    def forward(self, x_one_gram, x_two_gram, x_three_gram):
        one_gram_embedding = self.one_gram_embedding(x_one_gram)
        two_gram_embedding = self.two_gram_embedding(x_two_gram)
        three_gram_embedding = self.three_gram_embedding(x_three_gram)
        
        print(one_gram_embedding.shape, two_gram_embedding.shape, three_gram_embedding.shape)
        concat = torch.cat((one_gram_embedding, two_gram_embedding, three_gram_embedding), 0)
        return concat


model = NN(26, 60, 100, 5)
model(x_char, x_2_gram, x_3_gram)

Given different input shapes I don’t get how I can concatenate embedding outputs. Thanks!

Well, all your three embedding layers have the same embedding_dim, say, 100. So each one-gram, bigram and trigram gets mapped to a 100-dim vector. So you can concatenate these three vectors a long both dimensions, either to get a (1, 300) vector or a (3, 100) vector.

Thanks! How can I create a (1, 300) vectors from this setting? Given that created tokenised n-grams are of different shape I am getting an error

Sizes of tensors must match except in dimension 2. Got 16 and 8 in dimension 1 (The offending index is 1)