Training multiple models in a class hierarchy

Hi there!

I’m wondering if it’s possible to train models models to classify documents within a hierarchy. The problem is a multi label classification task within text. I’d like to first predict the higher levels of the classes, then the lower direct classes, for example topic -> subtopic -> class. Therefore the first classifier should attempt the top level, then based on that top level sub models should then attempt to classify the subtopic and so on…

Please could someone advise on how this could be implemented at a high level? Thank you.

Would you like to train these models end-to-end somehow or separately, i.e. first the “topic model”, then the “subtopic model” etc.?
In the latter case, you would need to be careful about your data splitting, since you are basically training a multi-staged model.
Usually you would split your data into a training, validation, and test set.
The training set is, as its name suggest, used to train the model, while you would use the validation set to change hyperparameters, perform early stopping etc.
The test set should be used just at the end (I doubt everybody uses it that way, but let’s assume we are not cheating :wink: ).
Now if you train and validate your “topic” model, the training and validation data is dirty, as your model has already seen both sets. To train your subtopic model, you would need real unseen predictions from your topic model. The training data would be quite useless at this point, since most likely your topic model has learned it quite well. Also the validation data was used to tune the topic model, which makes the predictions on this set probably a bit better than on new unseen data.
So we would need another train and validation dataset for our subtopic model.
Also, we would have to repeat the same step for the class model.

Basically you would have to create a train and validation dataset for each stage you are training.
If you take case of these, you should be fine.

That being said, how would you like to combine these models? Do you want to use the activations from the penultimate layer and feed it to the next model? If so, I think it could be worth a try to just use a single model and train three separate “heads” for each output, which would make the overall use case a bit simpler.

1 Like

Thanks ptrblck that’s really helpful especially around training/ test, did not consider this!!

Think I’d rather train one big model. Something I should possibly have mentioned is that the hierarchy has order and the end result/performance does rely on accuracy at higher levels.

I guess the part that is confusing me for implementing, is, should the ‘sub’ models require an input from the higher level model, or be used to filter somehow? Additionally do I need to do anything special to the model loss given each level has separate targets?

Thanks again!

I would guess that providing some kind of information from the higher level models to the sub models might increase the performance. If you are creating a big model with different outputs, you might calculate the losses separately, add them together, and call backward on the final loss.