What is the best way to deal with categorical variables?

morphism · February 6, 2021, 10:23am

Hello, how are you?

I am working on Deep Learning Project whose data has lots of categorical variables.
Some of them are just binary variables (0 or 1)

In Deep Learning, May I ask:

What is the best way to handle those categorical variables?
Can Deep Learning handle those data without any transformation?

And which technique or transformation should I use?

Thank you in advance and have a nice weekend

ptrblck · February 7, 2021, 6:43am

If your inputs contains categorical variables, you might consider using e.g. an nn.Embedding layer, which would transform the sparse input into a dense output using a trainable matrix.
I’m unsure what the alternatives would be and if passing these values to the model might even work in your case.

morphism · February 7, 2021, 11:06am

Hello,
Thank you for your answer.
So there is no ‘best’ way to handle categorical data in deep learning?

I will apply the nn.Embedding technique as you recommend.
Have a nice day and see you again.