image segmentation with cross-entropy loss

I am a new user of Pytorch.
I’d like to use the cross-entropy loss function

number of classes=2
output.shape=[4,2,224,224]
output_min=tensor(-1.9295)]
output_max=tensor(2.6400)]

number of channels=3
target.shape=[4,3,224,224]
targets_max=tensor(-2.1008)]
targets_min=tensor(-2.1179)]

how to evaluate:
loss = criterion(output, target)?
Thanks.

Hello Neo!

As an aside, for a two-class classification problem, you will be
better off treating this explicitly as a binary problem, rather than
as a two-class instance of the more general multi-class problem.
To do so you would use BCEWithLogitsLoss (“Binary Cross
Entropy”), rather than the multi-class CrossEntropyLoss.

But you can certainly treat this as a general multi-class problem,
and I will answer your question in this context.

So your outputs are raw-score logits, rather than probabilities
that lie between 0.0 and 1.0. Good.

I assume that your first dimension, 4, is your batch size. This is
fine.

It probably does not make sense to have a channels dimension in
your target. (If you think it does, you should further explain your
use case.) In any event, as it stands, this target shape won’t match
your output shape.

If your output shape is [nBatch, nClass, height, width],
then (for CrossEntropyLoss) your target shape must be
[nBatch, height, width], with no nClass dimension.

This is wrong (for CrossEntropyLoss). Your target values must
be integer (long) class labels that run from 0 to nClass - 1,
so in your two-class case, that take on the values 0 and 1.

If you could explain a little more where target comes from, and
what the numbers actually mean, we can help sort this out.

Best.

K. Frank

I’m sorry for the delay i had problems
and I’m sorry for my English

my model is:

number of classes=2 or 3 or 10

And the output dimension of the model is [No x Co x Ho x Wo]
where,

No -> is the batch size (same as Ni)
Co -> is the number of classes that the dataset have!
Ho -> the height of the image (which is the same as Hi in almost all cases)
Wo -> the width of the image (which is the same as Wi in almost all cases)

number of channels=1 or 3

the target dimension is [Ni x Ci x Hi x Wi]
where,

Ni -> the batch size
Ci -> the number of channels (which is 3 or 1)
Hi -> the height of the image
Wi -> the width of the image 

and thanks for your reply

I apply train transformation into image and mask:

train_transform = et.ExtCompose([
#et.ExtResize(size=opts.crop_size),
et.ExtRandomScale((0.5, 2.0)),
et.ExtRandomCrop(size=(opts.crop_size, opts.crop_size), pad_if_needed=True),
et.ExtRandomHorizontalFlip()
et.ExtToTensor(),
et.ExtNormalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])

mask is rgb or gray with values ​​of 0 background and 1class1 2class2

Hello Neo!

Ho and Wo must be the same as Hi and Wi in all cases, not
just in “almost all” cases.

This might be what you have, but it simply won’t work.
CrossEntropyLoss requires that, for a model output of shape
[No, Co, Ho, Wo], the target have shape [No, Ho, Wo]
(and that the values of the target are integer class labels that
run from 0 to Co - 1).

If your target has this extra “channel” dimension (Ci), it won’t
work (and Hi and Wi must match Ho and Wo, as well). (Just
to be clear, you can’t have the Ci dimension at all, even if
Ci = 1.)

Good luck.

K. Frank

thanks Frank for your reply
I’m going to look for another way

Hey :hugs: sorry to disturb you, just wanted to confirm -

  1. the raw logits are supposed to be one-hot encoded - say as a sample shape of (1, 6, 256, 256) if one has multi-class classification w/ 6 labels
  2. then the target to the loss function has to be to the non-onehot-encoded, the true integer labels in their pristine form.

I am confused why pytorch doesn’t do this implicitly :thinking: though was my understanding correct?