This is my forward function, where self.softmax = nn.LogSoftmax, self.output = nn.Linear(…), and self.recurrent = nn.RNN(…)
Now, as input I have an audio files which I have pre-processed using MFCC feature extraction, and encoded using a simple alphabet encoding. I pad my inputs per batch in my DataLoader.
Training seems to work: the loss starts at about 30 for my first input, and then gradually goes down after every batch. But after 7 or 8 batches, I start getting losses, in the [-1, 0] range. At that point, obviously, training doesn’t actually seem to improve the model at all anymore.
I was wondering if I’m missing something obvious here. I’ve been scratching my head for a while now…
by definition, a negative log likelihood cannot be negative, and I’ve not seen CTC loss return negative values for valid inputs. Based on that, can you double-check that your inputs are valid (e.g. no blank labels in the target)?
I get negative losses out of every 4-5K samples, they are really shorter than others. But input/target lenghts are OK. However cudnnctcloss gives positive values, so I switched them with deterministic flag setted to true.
I have a json file for those inputs if you want to investigate the issue, but apparently it doesnt let me attach them here. (this is with 1.3.0)
“Hello world” has 11 chars including the space, so it would be 11. “blank” is a special “output-only” character that means “nothing”, it is not space. The article Sequence Modeling with CTC has a good overview of how CTC works under the hood.
The “blank label” means exactly the “blank” itself, which namely means that it can’t represent any specific characters like space or enter and so on. The blank label is merely a placeholder for CTCLoss to be calculated correctly. Thus, naturally the space can’t be regarded as the blank label. CTCLoss needs the placeholder to separate different predicted characters so the blank label is essential.