Let’s say i have a data field named movie_genre for each sample movie
, it is selected from the following genres:
Action
Adventure
Animation
Comedy
...
And for each movie
, it might contain multiple genres:
mid genres
1 Action | Adventure
2 Animation
3 Comedy | Adventure | Action
which means, the movie’s genres is a variable list.
If i use one hot vector to encode the genre
, Action can be encoded as (1, 0, 0, 0), Adventure can be encoded as(0, 1, 0, 0), and so on.
So movie with mid1 can be encoded as (1, 1, 0, 0), mid2’s genre can be encoded as (0, 0, 1, 0), and so on.
However, the pytorch embedding layer nn.Embedding
takes tensor containing the indices as input, but not one-hot vector. So how should i encode the data so that it can be fetched into the embedding layer?
here’s the relative link in stackoverflow