CTC inf loss on a standard input

ysig · April 22, 2021, 9:53am

Hi,

I have created the following wrapper around CTC in order to make it an x, y criterion:

import torch.nn

class CTC(object):
    def __init__(self, pred_len, target_len, reduction='sum', blank=0):
        self.pred_len = pred_len
        self.target_len = target_len
        self.criterion = torch.nn.CTCLoss(blank=blank, reduction=reduction)
        
    def __call__(self, x, y):
        N = x.size()[1]
        x = x.log_softmax(2)
        input_lengths = torch.full(size=(N,), fill_value=self.pred_len, dtype=torch.long)
        target_lengths = torch.full(size=(N,), fill_value=self.target_len, dtype=torch.long)
        return self.criterion(x, y, input_lengths, target_lengths)

Unfortunately the ctc loss returns inf and I don’t understand why this happens.

In my case input_len = 130 and target_len = 96.
My blank character is 0 and I use another character for padding the target sequences 27.

My tensors x,y have always the following dimension:

x.size() = torch.Size([130, 2, 84])
and
y.size() = torch.Size([2, 96])

If I plot all the elements before the output I get:

x tensor([[[ -8.8013,  -5.9752,  -3.3089,  ...,  -2.0339,  -3.3143,  -7.6051],
         [ -8.6325,  -5.7270,  -4.2188,  ...,  -3.0726,  -4.6305,  -6.6532]],

        [[-10.0357,  -6.4363,  -3.8623,  ...,  -3.6846,  -4.5313,  -6.4330],
         [ -9.1688,  -5.8652,  -4.2010,  ...,  -3.8935,  -4.0454,  -6.7982]],

        [[ -9.8854,  -5.9701,  -4.5383,  ...,  -1.3867,  -4.1009,  -7.5813],
         [ -8.7483,  -6.2188,  -4.6652,  ...,  -2.7608,  -4.1413,  -8.9462]],

        ...,

        [[ -8.4533,  -6.3724,  -4.4986,  ...,  -3.2722,  -3.7872,  -7.3510],
         [ -9.6133,  -6.3350,  -4.9409,  ...,  -3.1443,  -3.7825,  -5.3427]],

        [[ -9.0553,  -6.3206,  -4.6541,  ...,  -3.9838,  -4.5517,  -6.6008],
         [ -9.5718,  -7.4225,  -3.6564,  ...,  -4.2466,  -4.5305,  -5.6684]],

        [[ -9.8397,  -5.7450,  -3.3306,  ...,  -3.5302,  -4.4930,  -5.6763],
         [ -9.2666,  -5.5616,  -3.9800,  ...,  -4.1454,  -3.7733,  -5.0473]]],
       grad_fn=<LogSoftmaxBackward>)
y tensor([[30, 82, 42, 44, 51, 34, 82, 75, 70, 82, 74, 75, 70, 71, 82, 42, 73, 12,
         82, 36, 56, 64, 75, 74, 66, 60, 67, 67, 82, 61, 73, 70, 68, 26, 27, 27,
         27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
         27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
         27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
         27, 27, 27, 27, 27, 27],
        [69, 70, 68, 64, 69, 56, 75, 64, 69, 62, 82, 56, 69, 80, 82, 68, 70, 73,
         60, 82, 41, 56, 57, 70, 76, 73, 82, 67, 64, 61, 60, 82, 45, 60, 60, 73,
         74, 26, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
         27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
         27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
         27, 27, 27, 27, 27, 27]])
input_lengths tensor([130, 130])
target_lengths tensor([96, 96])

Do you know what is causing this problem and how I can fix this?
Thanks

tom · April 22, 2021, 10:31am

That is not how you’re supposed to use CTC loss. Do feed in the proper target lengths.
What happens is that the loss as you call it would require the model to output PAD BLANK PAD BLANK PAD … for as many pads as you have at the end (for repetitions, CTC needs two x elements to represent one y element) and so this cannot be represented in the x length. That in terns means the probability for the x representing y is 0 and the negative log probability (which is the loss) is inf.

Best regards

Thomas

ysig · April 22, 2021, 11:11am

Thanks a lot.

For start, this seems to work.

class CTC(object):
    def __init__(self, pred_len, target_len, reduction='sum', blank=0, eos=26):
        self.pred_len = pred_len
        self.target_len = target_len
        self.criterion = torch.nn.CTCLoss(blank=blank, reduction=reduction)
        self.eos = eos

    def __call__(self, x, y):
        N = x.size()[1]
        x = x.log_softmax(2)
        input_lengths = torch.full(size=(N,), fill_value=self.pred_len, dtype=torch.long)
        target_lengths = torch.full(size=(N,), fill_value=self.target_len, dtype=torch.long)
        for i in range(N):
            target_lengths[i] = (y[i] == self.eos).nonzero()[0]
        return self.criterion(x, y, input_lengths, target_lengths)

As a best paractice, should the length correspond to the index of EOS or of the PAD after EOS?
Secondly what is a good percentage of input_sequence_length vs target_sequence_length?

Thank you!

tom · April 22, 2021, 6:50pm

Length should be the number of targets you want your model to predict.
For the relation: I’d probably look at your application for this. Clearly you want your model to be able to predict your targets, so you have this target length plus number of elements equal to their predecessor formula as a lower bound. Very long inputs for a given target length lead to a very imbalanced stepwise prediction problem because the effectively the model needs to assign a lot of probability to the blanks to get to sane losses. Probably look at established models in your domain and go with the flow until you have reason to believe that you can improve it.

Best regards

Thomas