Hi,
I am trying to train a shared embedding model using max-margin loss tested on an external retrieval metric.
The loss drops to zero just after one iteration!! and the metric is still very low.
I tried different margins, the same issue is there.
Any special tricks required to train under max-margin loss?