Does NLLLoss handle Log-Softmax and Softmax in the same way?

In the usual case of binary classification , I used NLLLoss and passed in 2 class weights(1 for pos and 1 for neg). So now in multilabel classification of 4 classes, would it be 8 class weights that have to be passed in with BCEWithLogitsLoss? And how do I pass the class weights ? Please help. Will it be passed as ‘nn.BCEWithLogitsLoss(weight = [1,2,3,4,5,6,7,8])’ , where each number corresponds to the class weight of either the corresponding pos or neg label?

If you are dealing with a multi-label classification, i.e. zero, one, or more targets can be active for each sample, then you could pass the pos_weight in the length equal to the number of classes.

1 Like

So in the case of number of classes = 4; number of labels in each class = 2 = {pos, neg};
it would be ,
nn.BCEWithLogitsLoss(weight = [a,b,c,d]) , where a = pos weight of class 1 ; b = pos weight of class 2; c = pos weight of class 3 ; d = pos weight of class 4; and the neg weights need not be specified. Have I understood it correctly? Please correct me if I am wrong. Thanks!

Generally yes, but you should use the pos_weight argument instead of weight (which expects a tensor in the shape [batch_size]).

1 Like

I used the pos_weight argument but I’m getting the following error : "The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 1 "
When i remove the pos_weight argument , the code is working fine. I don’t know what I’m doing wrong. Please help.

the following is the code:
class_weights = [2,3,1,4] #4 labels, each label can be either positive(1) or negative(0)
weights= torch.tensor(class_weights,dtype=torch.float)
cross_entropy = nn.BCEWithLogitsLoss(pos_weight=weights)

I’m not sure what might be causing this error, since this code snippet works fine:

batch_size = 10
nb_classes = 4
x = torch.randn(batch_size, nb_classes, requires_grad=True)
y = torch.randint(0, 2, (batch_size, nb_classes)).float()

loss = cross_entropy(x, y)
loss.backward()
1 Like

Since we’ve defined weights only for the positive label, how does the method recognize as to which label is the positive label and which one is the negative label?

As shown in the docs pos_weight is defined as number_of_negatives/number_of_positives and is applied to the positive class, thus a negative weighting is not necessary.

1 Like

Sorry, I think I didn’t frame my question properly. I have 2 labels : 0 and 1. Lets say that the weight of label 0 is 0.6 and the weight for label 1 is 0.4 . Now, I assign pos_weight = 0.4 , so my question is how does the method recognize that 1 is the positive label and assign a weight of 0.4 to it? What if it thought that 0 is the positive label?

The positive label is defined as 1, the negative as 0.
You could either weight the positive and negative loss terms with separate weights:

pos_weight * y * log(sigmoid(x)) + neg_weight * (1 - y) * log(sigmoid(x))

or divide by the neg_weight and use:

pos_weight/neg_weight * y * log(sigmoid(x)) + (1 - y) * log(sigmoid(x))

The latter approach is used in PyTorch. Don’t let the naming of my variables confuse you, since PyTorch uses pos_weight for the left weighting and you would often just pass num_negatives/num_positives as the value to it.

1 Like