I have 2 super classes, let’s say A and B. And then I have 2 sub classes under class B, call them C and D.

How should I train the model to appropriately classify the three leaf classes (A, C, and D)? Like how to combine the losses for both super and sub classes?

How to do the inference after? Like how to evaluate the probability distribution being used to do the classification?

Let me assume that your actual use case has three “leaf” classes.

Let’s say that A is “electro-sharks,” B is “mammals,” C is “cats,” and
D is “dogs.”

Just train this as a three-class classifier (with CrossEntropyLoss)
for just the leaf classes. I don’t see any likely benefit in trying to
account for the fact that “cats” and “dogs” are logically part of the
“mammals” super-class.

I’m not sure what you’re asking. I would say for inference, take argmax() of the output of your model to get the predicted integer
class label (of the three leaf classes) of the sample in question (or
pass that output through softmax() to get predicted probabilities
of the three classes).

The reason why I want to apply the hierarchical classification is that I want to increase my overall accuracy. Like I want the model to classify between my two super classes first which are two classes that are pretty similar but only have one major difference visually. And then classify based on the sub classes under class B. So maybe I want to compare the conditional probability instead of evenly distributed probability around all three classes. Or let’s say I have more than 2 sub classes, how does that work?
I will clarify my second question. So if I want to train based on both my super and sub classes, I will have 2 set of probability distribution, then how should I use them to find out which class I should pick?

In principle, the fact that two of your classes are grouped into a
single super-class is information about the structure of your problem
that could be provided directly to your model so that it wouldn’t have
to “learn” that part of the structure. And in principle, this could make
your model work better.

But I think that in practice, training directly with the three leaf classes
will work better than trying to introduce the super-class somehow. I
would say that the burden is on you to show that using the super-class
actually helps.

But you haven’t told us anything about your actual use case, so it’s
hard to say.

Let me assume that you are asking the following:

You predict the “electro-sharks” super-class (that only has one
sub-class) with probability p and the “mammals” super-class with
probability 1 - p. Within the “mammals” super-class you predict
“cats” with probability q and “dogs” with probability 1 - q. Then
you have predicted “electro-sharks” with probability p, “cats” with
probability (1 - p) * q, and “dogs” with probability (1 - p) * (1 - q).
To get the prediction for a single specific class, just take the class that
has the largest predicted probability.

Hi Frank,
How should I train the model to properly get the probability distribution for the two subclasses as q and (1-q) respectively?
Like should the end of the model has 2 nn.Linear layers? Each with output channel set as 2? For example, features = self.cnnBlock(input_sequence), and then I return super_class_output = self.fc_super(features) and sub_class_out = self.fc_sub(features) in my forward function, where self.fc_super = nn.Linear(input_channels, num_superclasses) and self.fc_sub = nn.Linear(input_channles, num_subclasses).
Then, if this is the case, how should I combine the losses? Should I pass in super class and sub class labels separately to the model to evaluate the loss? And then combine the two? But then how should I label each data file as? Like should I label electro-sharks’ sub class ask as electro-sharks?
Also, because inside each batch, there might be cases of electro-sharks, how should I pass that and differentiate between cats and dogs? Like should I get rid of the cases which are labeled as electro-sharks first and then evaluate the loss and then combine with the super class loss?