CTC loss and metric divergence

I am using a CRNN model (EffNet + BiLSTM) for an OCR task, and using CTC loss to train the model. I have observed a weird phenomenon wherein the loss per epoch keeps increasing while the model get better. I know this because I calculate the edit distance between the decoded predictions and the labels, and that’s improving.

Here’s how my loss function is set up:

criterion = nn.CTCLoss(blank=blank_label, reduction=‘sum’, zero_infinity=True)