BCELoss for MultiClass problem

a-parida12 · February 8, 2019, 8:56am

Is it a possibility to calculate the Multiclass crossentropy loss by successively using the nn.BCELoss() implementation

This is what I have tried.

# Implementation of the Multiclass Cross Entropy classification
def SegLossFn(predictions,targets):
    _, c, _, _ = predictions.size()
    loss=0
    m=nn.Sigmoid()
    loss_fn=nn.BCELoss()
    #BCE-> MCE by adding for each of the classes BCE
    for i in range(c):
        loss+=loss_fn(m(predictions[0][i]),Variable(targets[i][0]).cuda())
            
    return loss

ptrblck · February 8, 2019, 11:56am

For multi-class classification you would usually just use nn.CrossEntropyLoss, and I don’t think you’ll end up with the same result, as you are calling torch.sigmoid on each prediction.

For multi-label classification, you might use nn.BCELoss with hot-encoded targets and won’t need a for loop.

Could you explain your use case a bit as I’m currently not sure to understand it properly?

a-parida12 · February 8, 2019, 12:10pm

My question is about the the addition operation and how it is handled. What happens the grad_fn? how do they get added?

ptrblck · February 8, 2019, 12:17pm

The losses will get accumulated and your loss tensor will get grad_fn=<AddBackward0> as its grad_fn.

a-parida12 · February 8, 2019, 1:31pm

The problem for not using nn.CrossEntropyLosswas my predicted output is of size [N,C,H,W] and the target is of [N,C,H,W]. [H,W] in the C channel of the target is 0 or 1 based on presence or absence of class C. Is their a work around?

ptrblck · February 8, 2019, 1:39pm

nn.CrossEntropyLoss expects a torch.LongTensor containing the class indices without the channel dimension. In your case, you could simply use:

targets = torch.argmax(targets, 1)

to create your target tensor.

lugiavn · February 8, 2019, 4:22pm

I’m not sure if there is so called “Multiclass Cross Entropy”.
But doing binary classification for each class makes sense, so I think what you have tried is correct.

rasbt · February 8, 2019, 6:54pm

It’s usually called multi-category cross entropy but yeah, the CrossEntropyLoss is essentially that. Just be careful, the CrossEntropyLoss takes the logits as inputs (before softmax) and the BCELoss takes the probabilities as input (after logistic sigmoid)

lugiavn · February 8, 2019, 7:06pm

Thanks, I think I confuse multi-class with multi-label, where they do multiple BCE like that

rasbt · February 8, 2019, 7:20pm

I don’t think you confused anything, because both multi-label cross entropy and binary cross entropy work for dealing with multi-class problems. The difference though is that in binary cross entropy, mathematically, you assume that the classes are independent.

a-parida12 · February 8, 2019, 8:34pm

i think this particular implementation has some problem beause when I do loss.backward() improvement is seen only in the last class. The other previous channels just have complement of the last class.

For example:
Expected Output

[[1, 0],             [[0, 1],           [[0, 0],
 [0, 0]]              [0, 0]]            [1, 1]]

Got Output

[[1, 1],             [[1, 1],           [[0, 0],
 [0, 0]]              [0, 0]]            [1, 1]]

netaglazer · June 25, 2020, 10:56am

how can i use MSEloss for multi class?

ptrblck · June 26, 2020, 3:15am

You could try to apply a softmax as the last activation function in your model and use nn.MSELoss directly on these outputs and the one-hot encoded targets.
Alternatively, you could also use a single output unit, which could be used to predict the class “index” as a floating point number and calculate the loss using the targets containing class indices.

However, note that nn.MSELoss is not the usual loss function you would use for a classification use case.

netaglazer · June 26, 2020, 7:04am

tnx
if I do soft-max and one-hot, the final loss is a vector with length equal to the classes count
and in this case I cant do backwards because it works on single value:

def weighted_mse_loss(input,target,weights):
    out = (input-target)**2
    out = out * weights.expand_as(out)
    loss = out.sum(0)
    return loss

ptrblck · June 26, 2020, 8:51am

You could either pass the gradients in the same shape as the output (torch.ones_like(loss) would be the default) or reduce the loss to a scalar.

By default the loss functions will calculate the loss average and will return a scalar.