CTCLoss returns negative loss after some batches

bloemy · April 25, 2019, 7:19pm

Hey !
I’m using CTCLoss to implement a speech recognition recurrent neural network.

       def forward(self, x):

        hidden = None
        rnn_output, hidden = self.recurrent(x, hidden)

        rnn_output_flat = rnn_output.view(-1, self.hidden_size)

        lin_output = self.output(rnn_output_flat)

        output_flat = self.softmax(lin_output)

        output = output_flat.view(rnn_output.size(0), rnn_output.size(1), output_flat.size(1))

        return output

This is my forward function, where self.softmax = nn.LogSoftmax, self.output = nn.Linear(…), and self.recurrent = nn.RNN(…)

Now, as input I have an audio files which I have pre-processed using MFCC feature extraction, and encoded using a simple alphabet encoding. I pad my inputs per batch in my DataLoader.

Training seems to work: the loss starts at about 30 for my first input, and then gradually goes down after every batch. But after 7 or 8 batches, I start getting losses, in the [-1, 0] range. At that point, obviously, training doesn’t actually seem to improve the model at all anymore.

I was wondering if I’m missing something obvious here. I’ve been scratching my head for a while now…

Thanks a lot for your help !

tom · April 25, 2019, 7:27pm

Hi,

by definition, a negative log likelihood cannot be negative, and I’ve not seen CTC loss return negative values for valid inputs. Based on that, can you double-check that your inputs are valid (e.g. no blank labels in the target)?

Best regards

Thomas

bloemy · April 25, 2019, 7:37pm

Hi Thomas,
thanks for your fast answer.
I do actually think that I’m using the blank labels in target when I’m padding. What value should I pad by batches (input & ouput) with to get the best results?

Thanks a lot again.

tom · April 25, 2019, 7:52pm

The CTC loss will not look at inputs beyond the target_length you pass, but the first target_length inputs need to be non-blank.

bloemy · April 25, 2019, 7:53pm

Awesome Tom, this was exactly my issue. I just fixed it. Thanks a lot, really appreciate your quick help !

Ceday · December 11, 2019, 9:03am

I get negative losses out of every 4-5K samples, they are really shorter than others. But input/target lenghts are OK. However cudnnctcloss gives positive values, so I switched them with deterministic flag setted to true.
I have a json file for those inputs if you want to investigate the issue, but apparently it doesnt let me attach them here. (this is with 1.3.0)

import torch
import codecs
import json
import numpy as np

def dump_ctc_inputs(file, log_probs, target_lengths, targets, input_lengths):
    inputs = (log_probs.detach().cpu().numpy().tolist(), target_lengths.cpu().numpy().tolist(), targets.cpu().numpy().tolist(), input_lengths.cpu().numpy().tolist())
   
    with codecs.open(file, "w", "utf-8") as streamjson:
                    json.dump(inputs, streamjson, indent=4, ensure_ascii=False)

def read_ctc_inputs(file):
    with codecs.open(file, "r", "utf-8") as stream:
        lines = stream.read()
        inputsloaded = json.loads(lines)
        log_probs, target_lengths, targets, input_lengths = inputsloaded
        log_probs = torch.from_numpy(np.array(log_probs)).float()
        target_lengths = torch.from_numpy(np.array(target_lengths)).int()
        targets = torch.from_numpy(np.array(targets)).int()
        input_lengths = torch.from_numpy(np.array(input_lengths)).int()
        return log_probs, targets, input_lengths, target_lengths

ctc_loss = torch.nn.CTCLoss(reduction='none')
file = "C:\\1391713.json"
log_probs, targets, input_lengths, target_lengths = read_ctc_inputs(file)
log_probs.requires_grad_().cuda()
loss = ctc_loss(log_probs, targets, input_lengths, target_lengths)
loss2 = torch.nn.functional.ctc_loss(log_probs.detach().requires_grad_().cuda(), targets[0][0:6], input_lengths, target_lengths, reduction='none', zero_infinity=True)

barakb · November 6, 2022, 3:27am

Hi just to verify I understand what you mean, let’s say my target is:
“hello world”
Target’s length is 10 and not 11?

tom · November 7, 2022, 9:17am

“Hello world” has 11 chars including the space, so it would be 11. “blank” is a special “output-only” character that means “nothing”, it is not space. The article Sequence Modeling with CTC has a good overview of how CTC works under the hood.

Best regards

Thomas

barakb · November 7, 2022, 12:54pm

Thanks, if I may I have one more question, in this code:

github.com

ASR-project/Multilingual-PR/blob/main/models/BaseModule.py#L52


      
              unk_token=network_param.unk_token,
              pad_token=network_param.pad_token,
              word_delimiter_token=network_param.word_delimiter_token,
              do_phonemize=False,
              return_attention_mask=False,
          )
          
          
network_param.vocab_size = self.phonemes_tokenizer.vocab_size
          
          
# Loss function
          self.loss = nn.CTCLoss(
              blank=self.phonemes_tokenizer.encoder[network_param.word_delimiter_token]
          )
          
          
# Feature_extractor
          feature_extractor = Wav2Vec2FeatureExtractor(
              feature_size=1,
              sampling_rate=16000,
              padding_value=0.0,
              do_normalize=True,
              return_attention_mask=False,

They declare blank as space(i.e space between words), as I understand from your comment, it’s a mistake, the blank should be a separate token?

SuperTaurus · May 26, 2023, 3:30am

The “blank label” means exactly the “blank” itself, which namely means that it can’t represent any specific characters like space or enter and so on. The blank label is merely a placeholder for CTCLoss to be calculated correctly. Thus, naturally the space can’t be regarded as the blank label. CTCLoss needs the placeholder to separate different predicted characters so the blank label is essential.

Hope it helps