Predicting a combination of targets with a regression


I have around 40 features and 4 target descriptors. I’ve trained models against each target individually, and I know there is a relationship. Even a simple linear regression does reasonably well.

I’m aware that each target descriptor has an effect on the features, such that there is function of all 4 targets that would have their cumulative effect and be more predictable than each one is individually.

It seems to me that I could discover this with a simple linear regression, but I’m not clear how I would train.

I can only imagine a training loop within a training loop. Such that the outer loop has the 4 target linear regression producing a single target, then the inner loop trains a model with the 40 features against that to see how well it can predict, with a loss produced from that prediction and used in the outer training loop.

I’m not sure I’ve explained that very well, and regardless, there may well be a much simpler solution.

Advice appreciated.

if you use a single model with 4 outputs, an output layer implicitly models covariance (ie produces correlated predictions)

but that’s a model with fully shared parameters/features in hidden layers, that may be good or bad depending on how this matches with the true process. when this is undesirable, it is possible to have model designs with some non-shared parameters and/or layers, aka multi-task learning

what you’ve described is an extreme non-sharing design, its primary downside is O(num_outputs) time&memory costs