Jointly training of weights+input embeddings in a neural-network based model

I’m wondering if I can do the followoing thing:

Basically I want to build a deep learning model which takes a pre-computed set of features which already encodes some knowledge on the target, and concatenates it along a learnable vector of parameters and train a neural network-based architecture which then jointly trains these embeddings concatenated to the fixed feature vector and returns a prediction.

I just have a perplexity: should I pretrain the embeddings before and using them after by concatenating with features that I already have, or can I train the embeddings jointly with the overall model?

To summarize, ny idea is the following: I want to use both the information provided by tabular features that I already have, plus I want to learn the best representations directly from data.

@Federico_Ottomano

  • If you train the parameters jointly along with the overall model, your embeddings will reflect the patterns that help distinguish the input based on the target

  • If you do it separately ( i am guessing autoencoder), then the embeddings will reflect the input features only

You can try a third option
Have a autoencoder setup for the input data and use the encoder output as an input where you concatenate it along with the other features and have the prediction output. So your model will jointly train the autoencoder and the classifier

Many thanks for your suggestion! But what should I consider as input data for the autoencoder? Should I just choose some random representation ? What about the final loss also? Will it be a sum of the reconstruction loss + target loss?

@Federico_Ottomano If you dont have any prior data, then you should use the first option. Third option is only when you have additional features which you cannot use in the model

Ok, but I’m not quite figuring out how can I ‘train’ without having some fixed representation to feed the neural network… I mean in such a case, should I use nn.Embedding(num_samples, size) . Could you provide a little example of code please?

Also, I guess that the problem in using the first option, is that we won’t allow any ‘inductive’ ability on the model, as the predictions will be constrained to the specific trainable embeddings we have provided at the beginning, i.e. if I provide a new input, we won’t know which features to assign it. Is that correct?