Crf loss being negative during training

I implement a version of Linear Chain CRF based on Pytorch framework. After testing, I use that with NER dataset. I found the crf loss, aka NLLoss, being negative with the train process going by.

In the implementation of my linear chain crf, there are not <START> and <END> tag, which I guess the main reason for the negative loss.

Does someone have the same problem?