How to interpret and get classification accuracy from outputs with MarginRankingLoss

When dealing with a binary classification problem, we can use either logistic loss or hinge loss to train our model, implemented with CrossEntropyLoss() or MarginRankingLoss() respectively, if I don’t misunderstand.

In both cases, we can use the same network, say a single FC to project the inputs to outputs y_est of shape (batch_size, 2), where 2 comes from the binary classification.

For CrossEntropyLoss(), we simply argmax the outputs per sample in a batch to yield our predictions: index 0 and 1 for class label 0 and 1, respectively.

On the other hand, do we do the same for MarginRankingLoss()? The target argument in the function (y in the formula below) should be either 1 or -1 (instead of 1 and 0), can we also argmax and say that index 0 and 1 as -1 and 1, respectively?

loss(x,y)=max(0,−y∗(x1−x2)+margin)

I am also wondering if it is correct to regard x1 and x2 as the y_est[:,0] and y_est[:,1].

Hi, in my understanding, x1 and x2 corresponds to distance between positive(same-class) pair and negative pair. Before you apply MarginRankingLoss(), you have to compute those distances from batch.

For example, we have X1c,X2c,X3,X4 (X1c,X2c belong to class c and others are not).

  • In the binary classification phase, we apply usual classification to X1~X4.
  • In the ranking phase, at first, distance Dp = d(f(X1c) - f(X2c)), Dn = (f(X1c) - f(X4)) are computed, where f() returns model output, d() returns distance between two feature vectors.
    Then loss computation would be as follows:
    loss = max(0, Dp - Dn + m)

In your case, if you fix x1 as Dp and x2 as Dn, y should be always 1

I have an impression of your description more like hinge loss for triplet.
What I was referring to does not have to worry about pairs; apologize if I didn’t make clear.

Particularly, I used MarginRankingLoss() as hinge loss for binary classification as answered here. So I am wondering how to make equal the formula in the OP (eq. 1, I believe that it is y E {-1, 1} not t) and the one from the doc (eq. 2):

max(0, 1 - t * y) where t E {-1, 1} (1)
max(0, -y * (x1 - x2) + 1) (2)

so I make x1 and x2 as y_est[:,0] and y_est[:,1], respectively. I also tried to set x2 to zeros but I am not sure about it.