About cosine similarity, how to choose the loss function and the network(I have two plans)

Sorry I have no clue, I don’t know where to find a solution. I’m using two networks to construct two embeddings,I have binary target to indicate whether embeddingA and embeddingB “match” or not(1 or -1). The dataset like this:

embA0 embB0 1.0
embA1 embB1 -1.0
embA2 embB2 1.0

I hope to use cosine similarity to get classification results. But I feel confused when choosing the loss function, the two networks that generate embeddings are trained separately, now I can think of two options as follows:

Plan 1:

Construct the 3rd network, use embeddingA and embeddingB as the input of nn.cosinesimilarity() to calculate the final result (should be probability in [-1,1] ), and then select a two-category loss function.

(Sorry, I dont know which loss function to choose.)

class cos_Similarity(nn.Module):
    def __init__(self):

    def forward(self,a,b):
        return cos(output_a,output_b)


Plan 2: The two Embeddings as the output, then use nn.CosineEmbeddingLoss() as loss function, when I calculate the accuracy, I use nn.Cosinesimilarity() to output the result(probability in [-1,1]).


loss_function = torch.nn.CosineEmbeddingLoss()


I really need help. How do I make a choice? Why? Or I can only make a choice for me through experimental results. Thank you very much!

Well u should make a choice through ur experimental results but then in my opinion ur plan 1 seems more ligit then plan 2

Thank you very much! I am running. Can you tell more difference or theory? :blush:

U can read up the theory of the cosine similarly and the cross entropy on pytorch.org
The reason y I chose plan 1 over 2 is this computation time and memory allocation u see plan 2 theoretically is supposed to give better accuracy as it is using the Cos embedding loss used for comparing if 2 values are equal but in practice they both will give u similar results

The reason for that is in the weight and bias initialization and some other factors

I have probably understood some, thank you for your reply. I am running Plan1 but the model does not seem to converge. It may be that I have chosen the wrong loss function. I will continue to try… :cry:

No problem like I said theoretically plan 2 supposed to give better results but in practices it’s not really the case u can try other loss func

Sorry to disturb you, but I still have a very weird question. When my negative sample target is 0, the loss can drop normally, but when the target is -1, it stays at 0.75, which is almost unchanged… It bothers me…

No problem I don’t feel disturbed in anyway as I’m always happy to be of help.
Are u by any chance using a ReLU activation?

ReLU activation tends to scale weight values to be constrained within range of zero(0) to positive infinity, so when u use a ReLU it’s scales negative values(values below zero) to zero and if these scaled weights are used to compute an output these outputs which r scaled to range(0, infinity) will be compared with the target, but these targets have negative values so it’s like comparing 0 to -1 forever and ever.

Hope u understand this although this is if u r using ReLU activation. If it’s not the case it could also be from other activations like softmax, and sigmoid which outputs range of 0, 1 hence giving no room for negative values.