Optimizing and reverting nn.Embedding

catosphere · March 3, 2018, 1:17pm

Dear Pytorch community,

I am currently working on sparse/categorical data on which I am training auto-encoders like models.

For that purpose, it seems suited to have an embedding layer at the input of the encoder.
The way I considered is passing my categorical data into an embedding layer to get float data into let’s say norm [-1,1] then encode-decode back to [-1,1].

At that point, I have a few questions about which I hardly find discussions, tutorials or guidelines … Maybe I am taking wrong solutions, please correct me if there is more suited processing for that case.

#1a – reconstruction loss could be computed on the [-1,1] float data representation (like a L1 loss)
#2a – reconstruction loss could be computed on the categorical data representation, then I need to invert the embedding transformation to reconstruct the categorical data from the [-1,1] decoder output … Is there a direct way to perform such a reverse transformation ??
This inverted transformation would also be needed for #1 as once my model trained, I would still need to reconstruct some categorical data from it.

#3a – or I could define the decoder output to be of the size of the dictionary and output a logsoftmax prediction to be used with a classification loss. Then to recover the categorical data, I would take the categorical value corresponding to the bin of highest prediction probability in the dictionary. In my case, its size can go up to 7822 which which I am afraid could make the learning quite complicate ? Or does it seems reasonable ?

About optimizing such model, the nn.Embedding description points it should be used with a sparse optimizer (in case the embedding is trainable). Also, I set padding_idx=0 so that zeros are kept zero in the model, which I want. In such setting, what would be a best practice please ?

#1b – not training on the embedding weight
#2b – set a separate sparse optimizer for the embedding and a traditional optimizer for the auto-encoder
#3b – optimize everything together with a sparse optimizer

Thank you very much for your recommendations !

catosphere · March 5, 2018, 11:47am

Up !

Anyone with recommendations on the right way to use nn.Embedding in that case ?

thanks

catosphere · March 5, 2018, 3:20pm

By the way, if someone is passing-by here, there could be a typo on the torch.nn.functional.nll_loss documentation.

In case of a K-dimensional loss it should be a target shape of (N,d1,d2,…,dK) instead of (N,C,d1,d2,…,dK) (which is only applying to the input shape) ?