I was reading the documentation of torch.nn and I look for a loss function that I can use on my dependency parsing task.
On some papers, the authors said the Hinge loss is a plausible one for the task. However, it seems the Cross Entropy is OK to use.
Also, for my implementation, Cross Entropy fits more than the Hinge. What should I do? What are the differences and pros/cons between these two loss function? And also, is the Hinge Embedding Loss is the same one with the regular Hinge loss that mentioned in here?
Thank you and happy new year to all.