Is it a possibility to calculate the Multiclass crossentropy loss by successively using the nn.BCELoss() implementation

This is what I have tried.

# Implementation of the Multiclass Cross Entropy classification
def SegLossFn(predictions,targets):
_, c, _, _ = predictions.size()
loss=0
m=nn.Sigmoid()
loss_fn=nn.BCELoss()
#BCE-> MCE by adding for each of the classes BCE
for i in range(c):
loss+=loss_fn(m(predictions[0][i]),Variable(targets[i][0]).cuda())
return loss

For multi-class classification you would usually just use nn.CrossEntropyLoss, and I donâ€™t think youâ€™ll end up with the same result, as you are calling torch.sigmoid on each prediction.

For multi-label classification, you might use nn.BCELoss with hot-encoded targets and wonâ€™t need a for loop.

Could you explain your use case a bit as Iâ€™m currently not sure to understand it properly?

The problem for not using nn.CrossEntropyLosswas my predicted output is of size [N,C,H,W] and the target is of [N,C,H,W]. [H,W] in the C channel of the target is 0 or 1 based on presence or absence of class C. Is their a work around?

Iâ€™m not sure if there is so called â€śMulticlass Cross Entropyâ€ť.
But doing binary classification for each class makes sense, so I think what you have tried is correct.

Itâ€™s usually called multi-category cross entropy but yeah, the CrossEntropyLoss is essentially that. Just be careful, the CrossEntropyLoss takes the logits as inputs (before softmax) and the BCELoss takes the probabilities as input (after logistic sigmoid)

I donâ€™t think you confused anything, because both multi-label cross entropy and binary cross entropy work for dealing with multi-class problems. The difference though is that in binary cross entropy, mathematically, you assume that the classes are independent.

i think this particular implementation has some problem beause when I do loss.backward() improvement is seen only in the last class. The other previous channels just have complement of the last class.

You could try to apply a softmax as the last activation function in your model and use nn.MSELoss directly on these outputs and the one-hot encoded targets.
Alternatively, you could also use a single output unit, which could be used to predict the class â€śindexâ€ť as a floating point number and calculate the loss using the targets containing class indices.

However, note that nn.MSELoss is not the usual loss function you would use for a classification use case.

tnx
if I do soft-max and one-hot, the final loss is a vector with length equal to the classes count
and in this case I cant do backwards because it works on single value:

def weighted_mse_loss(input,target,weights):
out = (input-target)**2
out = out * weights.expand_as(out)
loss = out.sum(0)
return loss