Negative CTC loss

tom · March 23, 2021, 8:29am

As the example of @paarandika was easier to reproduce with the code he offered, I answered there. The gist of it is that you have to be a bit careful about “the prediction” here. CTC loss measures the cumulative probability of all possible alignments. If the individually most probable alignment is matching the targets one, that does not say much about that the sum of all possible alignments is gives a lot of probability mass to the targets. This is particularly true for cases where the predictions are somewhat longer than the target sequences (i.e. you have a lot of ε). Keep in mind that the loss is the negative loss likelihood of the targets under the predictions: A loss of 1.39 means ~25% likelihood for the targets, a loss of 2.35 means ~10% likelihood for the targets. This is very far from what you would expect from, say, a vanilla n-class classification problem, but the universe of alignments is rather large (if you have predictions of sequence length N with C characters (excluding ε), there are sum_{k=1…N}(C^k) >= C^N possible predictions, some of them (hopefully) map to the target sequence (else you get infinite loss, people do ask about this on the forum).

Best regards

Thomas