Hey Folks, I was just trying to understand the Pytorch Embedding layers. I am creating an time-series prediction model using an LSTM, but I also have some categorical information that I want to include in the model. I have some time series variables, such as the year, month, week number, and day, as well as some spatial variables including US State and county number.
Now, I was wondering if I have to create separate embedding layers for each categorical column, or can I just create a single embedding to cover all of the categorical columns? More specifically, do I need to create a separate embedding column for the year, then a separate one fr the month, and then one for the week, and day. In this case, then I would need to concatenate all of those layers to pass them to a fully connected layer or something.
Or can I just keep the (year, month, week number, and day) as the matrix that I input into the embedding layer? In other words, does the pytorch implementation of Embedding layers handle having these multiple columns as represented by a single output embedding matrix?
Hopefully my question is clear, but please let me know if I need to clarify anything. I just wanted to understand how to best use these embeddings for categorical features. Thanks.
You could feed the data as a single tensor to the nn.Embedding layer.
However, I would use separate embeddings, as your input data would have completely different ranges. nn.Embedding layers accept inputs containing values in [0, nb_embeddings-1].
If your year data contains values in e.g. [1988, 2020], you would waste a lot of embedding vectors, as they are never used.
However, if you normalize the data (subtract the min. value), you would have overlapping indices for all data attributes, i.e. the year, week, day etc. would all index the same embeddings.
@ptrblck thanks so much for the response. Yes, this make sense, I was thinking along these lines as well. I suppose if I want to combine some categorical variables, I could Cartesian product the columns and then generate embedding values for that cartesian product. But I can see how keeping the embeddings separated makes sense most of the time. Thanks again.