Dear Pytorch community,
I am currently working on sparse/categorical data on which I am training auto-encoders like models.
For that purpose, it seems suited to have an embedding layer at the input of the encoder.
The way I considered is passing my categorical data into an embedding layer to get float data into let’s say norm [-1,1] then encode-decode back to [-1,1].
At that point, I have a few questions about which I hardly find discussions, tutorials or guidelines … Maybe I am taking wrong solutions, please correct me if there is more suited processing for that case.
#1a – reconstruction loss could be computed on the [-1,1] float data representation (like a L1 loss)
#2a – reconstruction loss could be computed on the categorical data representation, then I need to invert the embedding transformation to reconstruct the categorical data from the [-1,1] decoder output … Is there a direct way to perform such a reverse transformation ??
This inverted transformation would also be needed for #1 as once my model trained, I would still need to reconstruct some categorical data from it.
#3a – or I could define the decoder output to be of the size of the dictionary and output a logsoftmax prediction to be used with a classification loss. Then to recover the categorical data, I would take the categorical value corresponding to the bin of highest prediction probability in the dictionary. In my case, its size can go up to 7822 which which I am afraid could make the learning quite complicate ? Or does it seems reasonable ?
About optimizing such model, the nn.Embedding description points it should be used with a sparse optimizer (in case the embedding is trainable). Also, I set padding_idx=0 so that zeros are kept zero in the model, which I want. In such setting, what would be a best practice please ?
#1b – not training on the embedding weight
#2b – set a separate sparse optimizer for the embedding and a traditional optimizer for the auto-encoder
#3b – optimize everything together with a sparse optimizer
Thank you very much for your recommendations !