Multilabelmarginloss

xolotl18 · June 26, 2020, 1:32pm

Hi everyone,
I seem to have problems making this loss function work.
I feed it the output of the network and the labels from the dataset and after the first epoch it says that the loss is 0, but the accuracy is still low.
Do you know where the problem should be?

ptrblck · June 27, 2020, 9:59am

Could you post the code you are using to calculate the accuracy as well as an example tensor for your model output and the corresponding target, please?

xolotl18 · June 27, 2020, 10:39am

sure, here they are:

tensor([[-0.4108,  2.3717, -1.2828,  ...,  0.1713,  1.2362, -1.3406],
        [-0.2165,  1.2525, -0.5553,  ...,  0.5631,  0.7271, -0.7836],
        [-0.3744,  1.2202, -0.3709,  ...,  0.3243,  0.6756, -0.9807],
        ...,
        [-0.5486,  1.7626, -0.7162,  ...,  0.2115,  0.9654, -1.0366],
        [-0.7164,  1.5161, -0.2281,  ...,  0.5199,  0.7857, -1.1157],
        [-0.3701,  1.3838, -0.5642,  ...,  0.3960,  0.9117, -0.7888]],
       device='cuda:0', grad_fn=<AddmmBackward>)

torch.Size([64, 100])
tensor([86, 67, 17, 30, 90, 60, 51, 84, 59, 30, 50, 21, 34, 45, 38, 80, 37, 81,
         5, 39, 41, 20, 21,  5, 23, 49,  1, 23, 70, 42, 26, 86, 28, 14, 65, 38,
        53, 80, 98, 79, 29, 41,  6, 77, 23, 16, 85, 59, 46, 14, 73, 43, 43, 39,
        72, 18, 61, 63,  1, 38, 67, 86, 84, 51], device='cuda:0')
torch.Size([64])

actually now it gives an error:

-> 2606     return torch._C._nn.multilabel_margin_loss(input, target, reduction_enum)
   2607 
   2608 

RuntimeError: inconsistent target size: [64] for input of size: [64, 100]

It did not give that error when I used one hot encoded labels, but reading the implementation of the loss one hot labels do not make sense.
Thank you for your help

ptrblck · June 28, 2020, 8:49am

Based on the docs, the output and target should have the same shape.
If I understand it correctly, each row of the target should contain all labels for the current sample, where a -1 target marks the end of all valid labels.

E.g. if your first sample contains the valid labels 3 and 0, the example label from the docs would be:

>>> # for target y, only consider labels 3 and 0, not after label -1
>>> y = torch.LongTensor([[3, 0, -1, 1]])

Are you sure your use case is a multi-label classification (each sample can belong to more that a single class) and not a multi-class classification (each sample belongs to one class only)?

xolotl18 · June 28, 2020, 9:44am

you are right, I got confused. I need a multiclass classification loss, not a multi label. Is there an implementation of multiclass hinge loss that I can use? Is MultiMarginLoss the one that I should use?

ptrblck · June 28, 2020, 11:51pm

I think nn.MultiMarginLoss would be the suitable criterion:

Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input x (a 2D mini-batch Tensor) and output y

Based on the shape information it should also work for your current output and target shapes.

Let me know, if it would work for you.

xolotl18 · June 28, 2020, 11:57pm

Thank you very much, as soon as I try it I will let you know

xolotl18 · July 3, 2020, 10:45am

It works. But when I tried implementing a squared version of this hinge loss, after the first time it computes the loss it goes to 0, even if the training accuracy is very low.
here is the class I implemented:

class SqHinge(nn.Module):
  def __init__(self):
        super(SqHinge, self).__init__()

  def forward(self, y_pred, y_true):
        criterion = nn.MultiMarginLoss()
        l1 = criterion(y_pred, y_true)
        l2 = l1*l1
        return l2

I tried using torch.pow, but the result was the same

ptrblck · July 4, 2020, 5:31am

The squared loss should be a 0 only if the l1 loss is also a 0, shouldn’t it?
For small values <1 it would reduce the loss, but shouldn’t output a perfect 0 loss.

xolotl18 · July 4, 2020, 7:26pm

That is what I thought, but it goes to 0 after the first batch. I actually need to confirm the exact moment it goes to 0, maybe that will give a better insight on the problem. Do you see anything wrong with the implementation above?