Backprop for hierarchical multilabel (~6k)

Hi there,

I am working on a multilabel classification problem where from process parameters I want to predict defects to later make optimization and root cause analysis with SHAP.

The number of labels is huge although they are structured in a hierarchical way:
For each view (Left, Right, Top… (6)) I have 10x10 positions(x,y coords) where there can be up to 10 defect types.

I am modeling this like a tree structure from the latent space.
latent space → views → positions → defect type

My question is, which is the best approach for the backpropagation?

  • BCE from the 6k labels
  • BCE and backprop for each end logit (defect type)
  • Another approach? :slightly_smiling_face:

I have found very scarce information about this topic, so I would love any input or reference materials

Thanks! :ninja: