Hi, I have confusion about optimizing the loss objective with SGD
loss=1-F.cosine_similarity(x,y) and loss= -F.cosine_similarity(x,y), are these two represent to minimize the similarity using SGD? Apparently, (1-Cosinesim) seems suitable to me but i have seen the (-CosineSim) in a contrastive Self-Supervise learning objective. So how these two behave differently?
whats the difference between : loss = -F.crossentropy(logits, label) and loss=F.crossentropy(-logits, label) ? are these two represents to maximize the cross entropy ?
-F.crossentropy is not bounded and minimizing loss will not converge
-F.crossentropy(logits, label) is not equal to F.crossentropy(-logits, label)
crossentropy is equivalent to the combination of LogSoftmax and NLLLoss
So logits input first go through LogSoftMax.
And for F.crossentropy(-logits, labels) think it this way: minimizing it will return estimator with minimum cross entropy between softmax(-logits) and lables .
are these two represents to maximize the cross entropy?