NT-Xent loss vs. other infoNCE variants

joohyunglee · August 18, 2022, 2:41am

It may be an unfitting question to this thread.
It looks like NT-Xent loss from SimCLR is different from other papers, e.g. MoCo, CPC, instance discrimination paper, in that NT-Xent loss exclude positive sample in its denominator. So, NT-Xent loss has 2N-2 terms in its denominator whereas the other infoNCE variants have 2N-1 terms (Is this correct?) Why does NT-Xent loss exclude the positive sample at its denominator? Thank you ahead.

tom · August 18, 2022, 8:08am

Thank you for mentioning this.

Note that in SimCLR the positive sample (i, j) is not excluded in the denominator but only the (i, i) pairing. So effectively, the sampling model is to draw the second index from all indices but i.

My reading of the loss in the CPC paper (and it is referenced by MoCo) is that they do not consider the index i at all and then draw the index j implicitly excluding i (because they just say “one positive sample”).

So for all I know, from the formulas it might be just a different presentation of the same loss.
With these details, it is not uncommon to see subtle differences between the paper and the implementation, so it might be good to look at those, too.

Best regards

Thomas

mxahan · August 18, 2022, 2:22pm

We can provide a simplified table for clarification. We are assuming s1 as sample 1 and t(s1) as its’ augmented format.

sample	Augmentation
s1	t(s1)
s2	t(s2)
s3	t(s3)
s4	t(s4)

The InfoNCE considers s1 and t(s1) as positives (numerator) and uses all 7 of them except s1 itself in the denominator.

The SimCLR considers all such combinations s2, t(s2), and… in a single loss calculation (we see the sum of them).

Hope that clarifies a bit.