I’m still struggling to understand your use case, so I’m not sure this will help, but to multi-hot encode a class list dataset with integer labels of:
Sample 0: [3, 7, 4]
Sample 1: [2, 4]
Sample 2: [9, 1, 6, 2]
Sample 3: [8, 9]
you’d transform it into something like:
Label: 0 1 2 3 4 5 6 7 8 9 ... N
| | | | | | | | | | |
v v v v v v v v v v v
Sample 0: [0, 0, 0, 1, 1, 0, 0, 1, 0, 0, ..., 0]
Sample 1: [0, 0, 1, 0, 1, 0, 0, 0, 0, 0, ..., 0]
Sample 2: [0, 1, 1, 0, 0, 0, 1, 0, 0, 1, ..., 0]
Sample 3: [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ..., 0]
where N is the total number of classes. Once you’ve transformed the labels, it’s pretty straightforward to put together a simple CNN using BCELossWithLogits. This is a common strategy for basic CNNs and is the basis of most “multi-label CNN” tutorials, but you’ll notice it does NOT preserve order or provide bounding boxes for detected entities/labeled objects. If you’re looking for that more advanced functionality, I’d recommend looking at:
but that would take you out of the torch framework altogether. Hopefully, someone more knowledgeable can comment and point you in the right direction for using torch. If not, here’s a link to a post that discusses multi-label classification problems in general. You might find some inspiration there…