Hi, I am new to pytorch. I am trying to implement a text summarization model. As an input feature, I would like to concatenate one hot vector with word embeddings. For example, my dataset is like this 0 Mr. Clinton is teaching Algebra to the students. 1 Monkeys are playing around in the garden.
I want to concatenate words in the sentence with their hot vectors as an input to nn.embedding module ’Mr’ + [1,0,0,0], ‘Clinton’ + [1,0,0,0]… ’Monkeys’ + [0,1,0,0], ‘are’ + [0,1,0,0]…
How can I achieve this?
Any suggestion would be helpful. Thanks in advance
nn.Embedding() takes indices of words as input and gives it’s embedding as output.
The question is not clear. here are two things to clarify:
By your question, I understand that you want to concatenate each word’s embedding with its one-hot vector. Is that right? In that case, the output of the nn.Embedding() layer should be concatenated with one hot vector.
In your example, you have appended ‘Mr’, ‘Clinton’ both with [0,0,0,0]. Is it intentional to have same index for these words?
Yes, I want to concatenate each word’s embedding with one hot vector. Basically, each sentence in the dataset is represented by one value which ranges from 0(min) to 3(max). First I am converting these numbers into one hot vector i.e. 0 = [1,0,0,0], 1 = [0,1,0,0], 2=[0,0,1,0] and 3 = [0,0,0,1]
I know that the output of nn.embedding layer must be concatenated with one hot vector, but I do not understand how can I achieve it for every single word of sentence.
For your second point, yes it is intentional to concatenate same index of that particular sentence with each word in that line.so if the dataset has following lines.
0 Mr. Clinton is teaching Algebra to the students. 1 Monkeys are playing around in the garden. 3 Oxygen is essential for life on earth. 2 Honesty is the best policy.
Then concatenation should be [‘Mr. Clinton’] + [1,0,0,0] , [‘is’] + [1,0,0,0] , [‘teaching’] + [1,0,0,0], … . [‘Monkeys’] + [0,1,0,0] , [‘are’] + [0,1,0,0] , [‘playing’] + [0,1,0,0], … [‘Oxygen’] + [0,0,0,1] , [‘is’] + [0,0,0,1] , [‘essential’] + [0,0,0,1], … [‘Honesty’] + [0,0,1,0] , [‘is’] + [0,0,1,0] , [‘the’] + [0,0,1,0], …
(sorry I edited my previous post a bit, hot vectors were not represented in right manner)
But at the time of concatenation, I am getting this error 'RuntimeError: expected Variable as element 1 in argument 0, but got tuple’
As output of nn.embedding and hot vector are of different datatype.
What I am doing wrong and how can I solve this? looking for your suggestion. I am using pytorch 0.3.
As suggested by some solutions, I am trying to convert datatype of one_hot by appending .float() but it is also not working. Can someone please suggest me some solution for this.