To minimize the loss, the numerator should be increasing, while the denominator should be decreasing. Since the denominator has “Sim” arguments, it has to decrease. The cosine similarity ranges from -1 to 1, and it seems that the closer the cosine similarity is to -1, the lower the loss becomes.
However, during my training process using this loss, I noticed that the similarity terms in the denominator converge to 0 instead of -1. (To be specific, sim(zi, zk) mentioned in the figure above)
Is there a mathematical explanation for this unexpected behavior?
Without knowing the details on how you obtain your latents z and what code you use to compute their cosine similarity, it’s really hard to tell what exactly caused this behavior.
A wild guess would be that you actually monitor the similarity after applying the exponential. If your model converges such that 0 < tau << 1 and sim(zi, zk) is approx. equal to -1, 0 would be the expected outcome in that case.
Actually, Mine is SimCSE and tau is a hyperparameter(=0.05)
(latent z can be obtained by passing through an encoder)
Sim(cosine_similarity) can get -1~1.
The problem is that Sim(zi, zk) converges to 0 while sim(zi, zj), the positive pair, goes to 1 as expected
And according to the loss formula, I think Sim should get close to -1.
I hope it makes sense.
If you have any opinion and share it, it would be very helpful to me.