How to setup regression problem

MichaelMMeskhi · September 16, 2020, 6:06pm

Hello,

I am looking into how to properly setup a regression problem. My concern is with the targets. For example, for each example in my input, I have 3x5 numerical outputs.

Example 1: output_1 - 5, 6, 7, 53, 0.5
output_2 - 5, 0.3, 7, 53, 0.5
output_3 - 5, 6.2, 7, 53, 0.5

And so on. Basically it is a multi-target regression if I am not mistaken. In the end, I want to predict only final 5 values based on 3 outputs available for each input example. I would like some guidance on how to process such data and what kind of architecture would be best suited to begin with.

Thank you

ptrblck · September 18, 2020, 4:05am

Could you explain how the single input sample create 3 outputs containing the 5 predictions?
Are these 3 predictions created by different models for the same input?

MichaelMMeskhi · September 18, 2020, 2:17pm

For example, an instance beach house with some qualities as a feature vector x. The goal is to predict this house’s price, customer rating, whatever some real values. But I have three such entries of one house. So let’s imagine the price, customer rating, etc… is different from 3 different websites. Does this make sense?

ptrblck · September 18, 2020, 11:07pm

Thanks for the explanation. In that case I think you could treat this output as 15 independent predictions.
I assume you would like to predict the 5 attributes from all 3 websites, not the mean/median etc.

MichaelMMeskhi · September 19, 2020, 6:31pm

Yes, you’re correct. Thank you!

BramVanroy · September 19, 2020, 7:02pm

I might be misunderstanding the problem, but it seems to make more sense to me to separate those three reviews into three different inputs so that in your final layer you only have to predict 5 attributes. Unless the problem is specifically to predict three sets of those attributes. Otherwise, I would guess that your problem is “for a given house, predict these five attributes”. If that is the case, you probably want to separate the different scores into different inputs.

MichaelMMeskhi · September 19, 2020, 7:20pm

I see how my question might be confusing, I did not phrase it well. Per example, I would like to predict 3 sets of 5 attributes.

BramVanroy · September 19, 2020, 9:46pm

Then you should indeed go with the suggestion by @ptrblck!

MichaelMMeskhi · September 23, 2020, 8:52pm

Hello,

I managed to setup a simple linear network with 2 hidden layers. Input data is 123 examples and 450 features. Predicting 12 output values (regression using MSELoss with Adam). Couple questions, my goal is to let the neural network perform feature selection/engineering. Then, I want to generate a new dataset based on this. For example, I learn some weights and then given a new dataset, say I pass the input through the weights without activations, those are my new learned examples based on the learned representation of my network?

ptrblck · September 25, 2020, 4:00am

The activations will be the output of each layer, so unless I misunderstand the sentence, you won’t be able to pass the input through the model without creating activations.

If you are trying to create new “features”, you could use the penultimate activations and try to train a new classifier on them. This would be similar to the fine-tuning approach, where you would freeze all but the last layers and only train the last layer.
However, if you store these activations separately you are also free to use other ML models such as SVMs, RandomForest etc.

I’m a bit confused about this sentence, as feature selection is used in the some ML approaches to select input features. E.g. a random forest classifier can be used to remove highly correlated or noisy features based on the entropy etc. of the splits.
However, it seems you would like to use feature selection or engineering “inside the model”?
Could you explain your use case a bit?

MichaelMMeskhi · September 25, 2020, 6:03pm

Hey @ptrblck I appreciate your feedback! Yes, that’s what I ended up using. My question is since I am using a ReLU as my non-linear activation, in the penultimate layer, obviously I end up with some 0 values. Would using a LeakyReLU be a better approach?

What I meant in regards to your second part, was that I have around ~500 features and would like to do some sort of feature engineering. I know a NN technically does that by learning a new representation and such. But what I meant was is there a specific way to achieve that or it just depends on what my problem is and what I want.

ptrblck · September 25, 2020, 9:36pm

You would have to check the model performance by switching the last non-linearity.
Note that zero outputs don’t necessarily mean a “bad” behavior. If your model is able to successfully predict the targets, this activation (with zeros) seems to be a good representation such that the last linear layer can act as the classifier and compute the outputs.

Are these 500 features the input features or are you creating them in the model as an intermediate activation?
Unfortunately, I don’t know what the current method would be to perform feature selection in a neural network. If I remember it correctly, I’ve read some papers which claim that e.g. the parameter or gradient magnitude does not correlate to the “importance” of these units. However, I cannot find the reference at the moment.

BramVanroy · September 25, 2020, 9:44pm

I’d also be interested to see what the current best way is to do feature selection over a given input feature set. One could try our all possible combinations but that is not always feasible.

MichaelMMeskhi · September 25, 2020, 9:49pm

You would have to check the model performance by switching the last non-linearity.
Note that zero outputs don’t necessarily mean a “bad” behavior. If your model is able to successfully predict the targets, this activation (with zeros) seems to be a good representation such that the last linear layer can act as the classifier and compute the outputs.

Performance is good. And yes, having a 0 doesn’t mean its bad but if I were to extract penultimate layer activations, I could simply drop the 0 valued columns and thus have even reduced dimensions.

Are these 500 features the input features or are you creating them in the model as an intermediate activation?
Unfortunately, I don’t know what the current method would be to perform feature selection in a neural network. If I remember it correctly, I’ve read some papers which claim that e.g. the parameter or gradient magnitude does not correlate to the “importance” of these units. However, I cannot find the reference at the moment.

Yes, those are the input features. I see. The parameter reference seems interesting.

Thank you!

MichaelMMeskhi · September 25, 2020, 9:50pm

Unfortunately that is not. I am currently exploring simple DNNs and Variational Autoencoders. So far, results seem to be similar.