Hierarchical document classification implementation

I’m trying to train a document classifier which has a large number of categories.

Is it possible to construct a network to classify sub categories (i.e higher levels of the class) and then a final prediction within one network? And how would this look? For example, in the below example could one classify level 1 and 2 with a sigmoid activation function, and then proceed to decide between 1.1 and 1.2 with softmax activation function?

image

It would be really helpful to have a simple example of how this could be implemented in PyTorch. Thank you in advance for thoughts :slight_smile: