Multi-label classification target padding

I have a problem that contains target classes from 0 to 6. But the number of targets of samples is not static. For example, while sample x_k has targets [0,1,5,3], it is possible that sample x_j can have [0,2].

The first thing that came to my mind is padding all labels with -100 to maximum label length (35), then use it with nn.BCEWithLogitsLoss. Then I read that, BCEWithLogitsLoss evaluates targets like they have 35 different classes.

What is the best way I should follow in this case? Thanks.

Assuming each sample can have zero, one, or multiple active classes, you could let the model return logits in the shape [batch_size, nb_classes] where each value in nb_classes would correspond to the logit for the corresponding class index.
The target could then be multi-hot encoded, where a zero would indicate an inactive and a one an active class. For the posted examples the target would thus be:

target = torch.tensor([[1, 1, 0, 1, 0, 1, 0]]).float()

[0, 2]
target = torch.tensor([[1, 0, 1, 0, 0, 0, 0]]).float()

My problem is more likely to sequential decoding. Each target can be repeated, like [5,2,1,0,1]. I padded the targets and, used nn.CrossEntropyLoss() with ignore_index=-100 parameter. But still no improvement…

Thanks for your answer.

Thanks for the update.
If I understand the use case correctly you might then be working on a multi-class sequence classification, i.e. each time step has only a single label?
In this case, you could use nn.CrossEntropyLoss with a model output in the shape [batch_size, nb_classes, seq_len] and a target in the shape [batch_size, seq_len] containing the class indices in the range [0, nb_classes-1].