About cosine similarity, how to choose the loss function and the network(I have two plans)

island99 · September 5, 2020, 4:24am

Sorry I have no clue, I don’t know where to find a solution. I’m using two networks to construct two embeddings，I have binary target to indicate whether embeddingA and embeddingB “match” or not(1 or -1). The dataset like this:

embA0 embB0 1.0
embA1 embB1 -1.0
embA2 embB2 1.0
...

I hope to use cosine similarity to get classification results. But I feel confused when choosing the loss function, the two networks that generate embeddings are trained separately, now I can think of two options as follows:

Plan 1:

Construct the 3rd network, use embeddingA and embeddingB as the input of nn.cosinesimilarity() to calculate the final result (should be probability in [-1,1] ), and then select a two-category loss function.

(Sorry, I dont know which loss function to choose.)

class cos_Similarity(nn.Module):
    def __init__(self):
        super(cos_Similarity,self).__init__()
        cos=nn.CosineSimilarity(dim=2)
        embA=generator_A()
        embB=generator_B()

    def forward(self,a,b):
        output_a=embA(a)
        output_b=embB(b)
        return cos(output_a,output_b)
loss_func=nn.CrossEntropyLoss()

y=cos_Similarity(a,b)
loss=loss_func(y,target)
acc=np.int64(y>0)

Plan 2: The two Embeddings as the output, then use nn.CosineEmbeddingLoss() as loss function, when I calculate the accuracy, I use nn.Cosinesimilarity() to output the result(probability in [-1,1]).

output_a=embA(a)
output_b=embB(b)

cos=nn.CosineSimilarity(dim=2)
loss_function = torch.nn.CosineEmbeddingLoss()

loss=loss_function(output_a,output_b,target)
acc=cos(output_a,output_b)

I really need help. How do I make a choice? Why? Or I can only make a choice for me through experimental results. Thank you very much!

Henry_Chibueze · September 5, 2020, 11:35am

Well u should make a choice through ur experimental results but then in my opinion ur plan 1 seems more ligit then plan 2

island99 · September 5, 2020, 1:19pm

Thank you very much! I am running. Can you tell more difference or theory?

Henry_Chibueze · September 5, 2020, 2:21pm

U can read up the theory of the cosine similarly and the cross entropy on pytorch.org
The reason y I chose plan 1 over 2 is this computation time and memory allocation u see plan 2 theoretically is supposed to give better accuracy as it is using the Cos embedding loss used for comparing if 2 values are equal but in practice they both will give u similar results

Henry_Chibueze · September 5, 2020, 2:23pm

The reason for that is in the weight and bias initialization and some other factors

island99 · September 6, 2020, 2:47am

I have probably understood some, thank you for your reply. I am running Plan1 but the model does not seem to converge. It may be that I have chosen the wrong loss function. I will continue to try…

Henry_Chibueze · September 7, 2020, 8:14pm

No problem like I said theoretically plan 2 supposed to give better results but in practices it’s not really the case u can try other loss func
Cheers

island99 · September 8, 2020, 7:19am

Sorry to disturb you, but I still have a very weird question. When my negative sample target is 0, the loss can drop normally, but when the target is -1, it stays at 0.75, which is almost unchanged… It bothers me…

Henry_Chibueze · September 8, 2020, 12:48pm

No problem I don’t feel disturbed in anyway as I’m always happy to be of help.
Are u by any chance using a ReLU activation?

ReLU activation tends to scale weight values to be constrained within range of zero(0) to positive infinity, so when u use a ReLU it’s scales negative values(values below zero) to zero and if these scaled weights are used to compute an output these outputs which r scaled to range(0, infinity) will be compared with the target, but these targets have negative values so it’s like comparing 0 to -1 forever and ever.

Hope u understand this although this is if u r using ReLU activation. If it’s not the case it could also be from other activations like softmax, and sigmoid which outputs range of 0, 1 hence giving no room for negative values.