Loss function for Hierarchical Multi-label classification

I am looking to try different loss functions for a hierarchical multi-label classification problem. So far, I have been training different models or submodels (e.g., a simple MLP branch inside a bigger model) that either deal with different levels of classification, yielding a binary vector. I have been using BCEWithLogitsLoss and summing all the losses existing in the model before backpropagating.

I am considering trying other losses like MultiLabelSoftMarginLoss and MultiLabelMarginLoss. What other loss functions are worth to try? hamming loss perhaps or a variation?
Is it better to sum all the losses and backpropagate or do multiple backpropagations?

Thanks in advance!