I need some help advice for the following. I have a text classification problem, where i’m training for 2 labels (let’s call them A and B). The classes for A and B differ: A can be 0-4 and B can be 0-8 (A can be cast to 0-8 i guess).

How would i go about this problem? I can use a one-hot for each label, but then i don’t know how to calculate the loss.

If I understand you correctly, you have two classification tasks (so it would be multi-task not multilabel in the usual lingo).

In that case, you can just output 14 logits and then group them for the loss if you want.
More concretely, you could output a batch x 14 vector of scores (so no final activations) and organize your labels into two batch-sized tensors.
Then for F.cross_entropy_loss(output[:, :9], target_a) + F.cross_entropy_loss(output[:, 9:], target_b).