How to concatenate word embedding with one hot vector

Hi, I am new to pytorch. I am trying to implement a text summarization model. As an input feature, I would like to concatenate one hot vector with word embeddings. For example, my dataset is like this
0 Mr. Clinton is teaching Algebra to the students.
1 Monkeys are playing around in the garden.

I want to concatenate words in the sentence with their hot vectors as an input to nn.embedding module
’Mr’ + [1,0,0,0], ‘Clinton’ + [1,0,0,0]…
’Monkeys’ + [0,1,0,0], ‘are’ + [0,1,0,0]…

How can I achieve this?

Any suggestion would be helpful. Thanks in advance

nn.Embedding() takes indices of words as input and gives it’s embedding as output.
The question is not clear. here are two things to clarify:

  1. By your question, I understand that you want to concatenate each word’s embedding with its one-hot vector. Is that right? In that case, the output of the nn.Embedding() layer should be concatenated with one hot vector.

In your example, you have appended ‘Mr’, ‘Clinton’ both with [0,0,0,0]. Is it intentional to have same index for these words?

1 Like

First of all, Thank you so much replying.

Yes, I want to concatenate each word’s embedding with one hot vector. Basically, each sentence in the dataset is represented by one value which ranges from 0(min) to 3(max). First I am converting these numbers into one hot vector i.e.
0 = [1,0,0,0], 1 = [0,1,0,0], 2=[0,0,1,0] and 3 = [0,0,0,1]

I know that the output of nn.embedding layer must be concatenated with one hot vector, but I do not understand how can I achieve it for every single word of sentence.

For your second point, yes it is intentional to concatenate same index of that particular sentence with each word in that line.so if the dataset has following lines.

0 Mr. Clinton is teaching Algebra to the students.
1 Monkeys are playing around in the garden.
3 Oxygen is essential for life on earth.
2 Honesty is the best policy.
Then concatenation should be
[‘Mr. Clinton’] + [1,0,0,0] , [‘is’] + [1,0,0,0] , [‘teaching’] + [1,0,0,0], … .
[‘Monkeys’] + [0,1,0,0] , [‘are’] + [0,1,0,0] , [‘playing’] + [0,1,0,0], …
[‘Oxygen’] + [0,0,0,1] , [‘is’] + [0,0,0,1] , [‘essential’] + [0,0,0,1], …
[‘Honesty’] + [0,0,1,0] , [‘is’] + [0,0,1,0] , [‘the’] + [0,0,1,0], …

(sorry I edited my previous post a bit, hot vectors were not represented in right manner)

Looking forward to your help/suggestion.

you can use torch.cat to concatenate two tensors (the one-hot vector with the output of nn.Embedding layer).
Something like below:

import torch.nn as nn

num_sentences = 4
num_unique_words = 20
indices = torch.tensor([1,1,2,0,0,0,1])
one_hot_buffer = torch.eye(num_sentences)
embeddings = nn.Embedding(num_unique_words, 5)

one_hot = one_hot_buffer[indices]
embed = embeddings(indices)
concatenated = torch.cat((embed, one_hot), dim=1)
1 Like

Thank you for your reply.
As suggested in your code, I am trying something like below

import torch.nn as nn
word_to_ix = {"hello": 0, "world": 1}
num_sentences = 4
num_unique_words = 1

embeddings = nn.Embedding(num_unique_words, 5)
indices = torch.LongTensor([word_to_ix["hello"]])
one_hot_buffer = torch.eye(num_sentences)   

one_hot = one_hot_buffer[indices]
embed = embeddings(autograd.Variable(indices))

#print(embed.data.type())
#print(one_hot.data.type())
concatenated = torch.cat((embed,one_hot), 1)
print(concatenated)

But at the time of concatenation, I am getting this error 'RuntimeError: expected Variable as element 1 in argument 0, but got tuple’
As output of nn.embedding and hot vector are of different datatype.
What I am doing wrong and how can I solve this? looking for your suggestion. I am using pytorch 0.3.
As suggested by some solutions, I am trying to convert datatype of one_hot by appending .float() but it is also not working. Can someone please suggest me some solution for this.

This example doesn’t even work. Gives an error

only integer tensors of a single element can be converted to an index

You are right. There were a couple of typos which I had corrected.

1 Like