Automated Feature Engineering Using PyTorch


I want to implement something similar to this using Pytorch:

I don’t know how to combine models in the fashion done in the article using Tensorflow so was thinking in this approach:

My idea is to select, let’s say, 50 features and then build 50 models to create the extra features. Each model would have 49 of the 50 features as input features and would try to predict the one left. I would save the values of the last intermediate layer of each model and these values would be the engineered features used in my final model.

The flow would be something like this:
1.- Build a model for each feature to predict that feature using the rest of the features as input, and store the last intermediate layer of each model.
2.- Merge/Join intermediate layer values with the initial dataset (train_data). It would contain the initial features and the engineered ones.
3.- Train a final model with all those features and predicting the target.

But reading the article it says:
The trick is making sure that the feature networks train with the final model rather than a separate process.

My approach would definitely not do that so I guess I’m missing something. Why is this important?

The article also says:
Because we have several auxiliary outputs, we need to tell TensorFlow how much weight to give each one in determining how to adjust the model to improve accuracy. I personally like to give 50% weight to the auxiliary predictions (total) and 50% the the target prediction. Some might find it strange to give any weight to the auxiliary predictions since they are discarded at the loss calculation step. The problem is, if we do not give them any weight, the model will mostly ignore them, preventing it from learning useful features.

Again, I wouldn’t be doing anything like that so I’m wondering if an approach like what I have in mind make sense at all or I’m missing something.

Does my approach sounds good or I’m missing something?
Is there a way to combine models with PyTorch as it is done in the article with Tensorflow?

Thanks in advance!