I implement a version of Linear Chain CRF based on Pytorch framework. After testing, I use that with NER dataset. I found the crf loss, aka NLLoss, being negative with the train process going by.
In the implementation of my linear chain crf, there are not
<END> tag, which I guess the main reason for the negative loss.
Does someone have the same problem?