When I use cosine distance to do triple loss,AB/(|A||B|) Unexpectedly written as AB/(|A||A|),It’s the same as (AB/(|A||B|))(|B|/|A|). It looks like a coefficient has been added.I guess it may be beneficial for the SGD optimizer,because doing so increases the accuracy of the model after training by 1% compared to the correct approach.I don’t know if my conjecture is correct. If anyone is interested, could you try using your model and see if it’s the same?
Could you post more details about your experiments, how many runs you’ve executed as well as the mean +/- stddev of the final performance metric?