CTCLoss returns negative loss after some batches

Hey !
I’m using CTCLoss to implement a speech recognition recurrent neural network.

       def forward(self, x):

        hidden = None
        rnn_output, hidden = self.recurrent(x, hidden)

        rnn_output_flat = rnn_output.view(-1, self.hidden_size)

        lin_output = self.output(rnn_output_flat)

        output_flat = self.softmax(lin_output)

        output = output_flat.view(rnn_output.size(0), rnn_output.size(1), output_flat.size(1))

        return output

This is my forward function, where self.softmax = nn.LogSoftmax, self.output = nn.Linear(…), and self.recurrent = nn.RNN(…)

Now, as input I have an audio files which I have pre-processed using MFCC feature extraction, and encoded using a simple alphabet encoding. I pad my inputs per batch in my DataLoader.

Training seems to work: the loss starts at about 30 for my first input, and then gradually goes down after every batch. But after 7 or 8 batches, I start getting losses, in the [-1, 0] range. At that point, obviously, training doesn’t actually seem to improve the model at all anymore.

I was wondering if I’m missing something obvious here. I’ve been scratching my head for a while now…

Thanks a lot for your help !

Hi,

by definition, a negative log likelihood cannot be negative, and I’ve not seen CTC loss return negative values for valid inputs. Based on that, can you double-check that your inputs are valid (e.g. no blank labels in the target)?

Best regards

Thomas

2 Likes

Hi Thomas,
thanks for your fast answer.
I do actually think that I’m using the blank labels in target when I’m padding. What value should I pad by batches (input & ouput) with to get the best results?

Thanks a lot again.

The CTC loss will not look at inputs beyond the target_length you pass, but the first target_length inputs need to be non-blank.

Awesome Tom, this was exactly my issue. I just fixed it. Thanks a lot, really appreciate your quick help !

I get negative losses out of every 4-5K samples, they are really shorter than others. But input/target lenghts are OK. However cudnnctcloss gives positive values, so I switched them with deterministic flag setted to true.
I have a json file for those inputs if you want to investigate the issue, but apparently it doesnt let me attach them here. (this is with 1.3.0)

import torch
import codecs
import json
import numpy as np

def dump_ctc_inputs(file, log_probs, target_lengths, targets, input_lengths):
    inputs = (log_probs.detach().cpu().numpy().tolist(), target_lengths.cpu().numpy().tolist(), targets.cpu().numpy().tolist(), input_lengths.cpu().numpy().tolist())
   
    with codecs.open(file, "w", "utf-8") as streamjson:
                    json.dump(inputs, streamjson, indent=4, ensure_ascii=False)

def read_ctc_inputs(file):
    with codecs.open(file, "r", "utf-8") as stream:
        lines = stream.read()
        inputsloaded = json.loads(lines)
        log_probs, target_lengths, targets, input_lengths = inputsloaded
        log_probs = torch.from_numpy(np.array(log_probs)).float()
        target_lengths = torch.from_numpy(np.array(target_lengths)).int()
        targets = torch.from_numpy(np.array(targets)).int()
        input_lengths = torch.from_numpy(np.array(input_lengths)).int()
        return log_probs, targets, input_lengths, target_lengths

ctc_loss = torch.nn.CTCLoss(reduction='none')
file = "C:\\1391713.json"
log_probs, targets, input_lengths, target_lengths = read_ctc_inputs(file)
log_probs.requires_grad_().cuda()
loss = ctc_loss(log_probs, targets, input_lengths, target_lengths)
loss2 = torch.nn.functional.ctc_loss(log_probs.detach().requires_grad_().cuda(), targets[0][0:6], input_lengths, target_lengths, reduction='none', zero_infinity=True)