I am working on a multilabel classification problem where from process parameters I want to predict defects to later make optimization and root cause analysis with SHAP.
The number of labels is huge although they are structured in a hierarchical way:
For each view (Left, Right, Top… (6)) I have 10x10 positions(x,y coords) where there can be up to 10 defect types.
I am modeling this like a tree structure from the latent space.
latent space → views → positions → defect type
My question is, which is the best approach for the backpropagation?
- BCE from the 6k labels
- BCE and backprop for each end logit (defect type)
- Another approach?
I have found very scarce information about this topic, so I would love any input or reference materials