CTCLoss returns negative loss after some batches

Hey !
I’m using CTCLoss to implement a speech recognition recurrent neural network.

       def forward(self, x):

        hidden = None
        rnn_output, hidden = self.recurrent(x, hidden)

        rnn_output_flat = rnn_output.view(-1, self.hidden_size)

        lin_output = self.output(rnn_output_flat)

        output_flat = self.softmax(lin_output)

        output = output_flat.view(rnn_output.size(0), rnn_output.size(1), output_flat.size(1))

        return output

This is my forward function, where self.softmax = nn.LogSoftmax, self.output = nn.Linear(…), and self.recurrent = nn.RNN(…)

Now, as input I have an audio files which I have pre-processed using MFCC feature extraction, and encoded using a simple alphabet encoding. I pad my inputs per batch in my DataLoader.

Training seems to work: the loss starts at about 30 for my first input, and then gradually goes down after every batch. But after 7 or 8 batches, I start getting losses, in the [-1, 0] range. At that point, obviously, training doesn’t actually seem to improve the model at all anymore.

I was wondering if I’m missing something obvious here. I’ve been scratching my head for a while now…

Thanks a lot for your help !

Hi,

by definition, a negative log likelihood cannot be negative, and I’ve not seen CTC loss return negative values for valid inputs. Based on that, can you double-check that your inputs are valid (e.g. no blank labels in the target)?

Best regards

Thomas

2 Likes

Hi Thomas,
thanks for your fast answer.
I do actually think that I’m using the blank labels in target when I’m padding. What value should I pad by batches (input & ouput) with to get the best results?

Thanks a lot again.

The CTC loss will not look at inputs beyond the target_length you pass, but the first target_length inputs need to be non-blank.

Awesome Tom, this was exactly my issue. I just fixed it. Thanks a lot, really appreciate your quick help !

I get negative losses out of every 4-5K samples, they are really shorter than others. But input/target lenghts are OK. However cudnnctcloss gives positive values, so I switched them with deterministic flag setted to true.
I have a json file for those inputs if you want to investigate the issue, but apparently it doesnt let me attach them here. (this is with 1.3.0)

import torch
import codecs
import json
import numpy as np

def dump_ctc_inputs(file, log_probs, target_lengths, targets, input_lengths):
    inputs = (log_probs.detach().cpu().numpy().tolist(), target_lengths.cpu().numpy().tolist(), targets.cpu().numpy().tolist(), input_lengths.cpu().numpy().tolist())
   
    with codecs.open(file, "w", "utf-8") as streamjson:
                    json.dump(inputs, streamjson, indent=4, ensure_ascii=False)

def read_ctc_inputs(file):
    with codecs.open(file, "r", "utf-8") as stream:
        lines = stream.read()
        inputsloaded = json.loads(lines)
        log_probs, target_lengths, targets, input_lengths = inputsloaded
        log_probs = torch.from_numpy(np.array(log_probs)).float()
        target_lengths = torch.from_numpy(np.array(target_lengths)).int()
        targets = torch.from_numpy(np.array(targets)).int()
        input_lengths = torch.from_numpy(np.array(input_lengths)).int()
        return log_probs, targets, input_lengths, target_lengths

ctc_loss = torch.nn.CTCLoss(reduction='none')
file = "C:\\1391713.json"
log_probs, targets, input_lengths, target_lengths = read_ctc_inputs(file)
log_probs.requires_grad_().cuda()
loss = ctc_loss(log_probs, targets, input_lengths, target_lengths)
loss2 = torch.nn.functional.ctc_loss(log_probs.detach().requires_grad_().cuda(), targets[0][0:6], input_lengths, target_lengths, reduction='none', zero_infinity=True)

Hi just to verify I understand what you mean, let’s say my target is:
“hello world”
Target’s length is 10 and not 11?

“Hello world” has 11 chars including the space, so it would be 11. “blank” is a special “output-only” character that means “nothing”, it is not space. The article Sequence Modeling with CTC has a good overview of how CTC works under the hood.

Best regards

Thomas

Thanks, if I may I have one more question, in this code:

They declare blank as space(i.e space between words), as I understand from your comment, it’s a mistake, the blank should be a separate token?

The “blank label” means exactly the “blank” itself, which namely means that it can’t represent any specific characters like space or enter and so on. The blank label is merely a placeholder for CTCLoss to be calculated correctly. Thus, naturally the space can’t be regarded as the blank label. CTCLoss needs the placeholder to separate different predicted characters so the blank label is essential.

Hope it helps