What kind of loss to use with multi-hot encoded multiclass ordinal features?

I have some ordinal classes with a range of 1-5 each and some implicit relationships, and decided to multi-hot encode those such that each level gets an activation like

{
1: [0, 0, 0, 0]
2: [1, 0, 0, 0]
...
5: [1, 1, 1, 1]
}

And then concatenate for each class so the resulting vector might look like
[0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1]

What is a good loss function to use here? Is MultiLabelMarginLoss sufficient? Not sure if I can get the proper penalizing effect from a one-vs-all approach given multiple bits belong to the same class each.

BCE - Binary Cross entropy loss would be better than MultiLabelMarginLoss