I’m struggling while trying to implement this paper. After some epochs the loss stops going down but my network only produces blanks. I’ve seen a lot of posts on the forum concerning this issue and most of the time the problem resulted from a wrong understanding of the way CTCLoss works. So I tried to make a minimal example to see where my code went wrong.
import torch.nn as nn
nn.CTCLoss(blank=0, zero_infinity=True, reduction=none)
predicted = torch.tensor(
[0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0.]
expected = torch.tensor([
predicted_lengths = torch.tensor(predicted.shape * [predicted.shape])
ctc_loss(predicted.permute(1, 0, 2).log_softmax(2), expected, predicted_lengths, torch.tensor())
and it returns
I was expecting it to be zero as the prediction matches the target perfectly so if someone could explain why the loss can never be zero or where my mistake is I would really appreciate it
The inputs of log_softmax are in log-space, so you’d need to take the log (-inf and 0 instead of 0 and 1 respectively).
It does work on my small example. I’ll try it on my target network and let you know.
Thanks for the help !
I actually have a hard time understanding why log should be called before log_softmax because as the description of CTCLoss states :
Log_probs: Tensor of size (T, N, C)… The logarithmized probabilities of the outputs (e.g. obtained with
So I was expecting the call to log_softmax to do the log on its own (as described in the doc).
Either way I tried calling .log() on my actual outputs and not this small example and the training produced the same error (predicting only blanks) so I’m back to square one.
It’s not about logsoftmax vs softmax. Log softmax means softmax first, then log. But softmax expect logits, not probabilities.
Thank you for your answer @SimonW.
It means that my training problem does not come from the way I use CTCLoss then as I think I used it right in the first place according to your description. The inputs of the log_softmax came from my last nn.Linear layer.
In the model I have (only kept the relevant parts):
self.dense = to_best_device(nn.Linear(in_features=50, out_features=len(characters)))
def forward(self, x):
And in the training loop I have:
for i, batch_data in enumerate(dataloader):
data, labels = batch_data
outputs = model(data)
outputs = outputs.permute(1, 0, 2) # Because I have (batch_size, sequence_length, nb_features)
bs = len(data)
curr_loss = loss(outputs.log_softmax(2), labels, bs * [outputs.shape], [get_sentence_length(label) for label in labels])