You say in your title “multiclass+multilabel,” but you say in your post
itself “where each pixel may be the one of [0,1,2,3,4].”
To clarify the terminology:
Yes, I would call this “multiclass” because you have five different
classes (one background plus four non-background). (If you had
only two classes (“yes” vs. “no” or “0” vs. “1”), I would call it binary,
in contrast to multiclass.)
However, “multilabel” conventionally means that each “thing”
(sample, pixel, whatever) can be assigned to more than one
class at a time – that is, it can be “labelled” with multiple labels.
I read your phrase “where each pixel may be the one of [0,1,2,3,4]"
as meaning that each pixel will be (labelled with) exactly one of
[0, 1, 2, 3, 4]. In this case I would call your problem a conventional
(that is, single-label) multiclass classification problem. For this you
would want to use CrossEntropy (cross_entropy_loss()), for
which your “outputs” (predictions) and targets will be the shapes
you give, the C dimension (where C = 5) of your outputs will be the
raw scores (logits) predicted for the pixel in question to be in each
of your five classes, and targets will be categorical class labels
(integers in [0, 1, 2, 3, 4]), and not one-hot encoded.
On the other hand, if you really mean “multilabel,” then I would read
your phrase as meaning that each pixel can be labelled with any
number of the labels [0, 1, 2, 3, 4], including none of them.
In this case you would indeed want to use
but now your targets would be the same shape as your outputs, that
is, [B, C, H, W]. But your C dimension would not be one-hot encoded
(one value of 1, the other four 0), because more than one label, or
none, could be active at the same time. (So your C dimension might
have value (0, 1, 0, 1, 0) meaning that the pixel in question is
both in class “1” and “3” at the same time.)
If I’m right that you don’t mean “multilabel”, then no, one-hot
encoding is not the only way, and yes, the CrossEntropy loss
supports your kind of targets directly.
I think that you don’t mean “multilabel,” that the shapes of your
outputs and targets are correct, that your targets are categorical
class labels (as is consistent with the shape you give of [B, H, W])
(and hence not one-hot encoded), but that you will want to use CrossEntropy (which really means “categorical cross-entropy
with logits”) as your loss function.