Can CTCLoss go down to zero?

jleobernard · February 22, 2021, 8:31pm

Hello,
I’m struggling while trying to implement this paper. After some epochs the loss stops going down but my network only produces blanks. I’ve seen a lot of posts on the forum concerning this issue and most of the time the problem resulted from a wrong understanding of the way CTCLoss works. So I tried to make a minimal example to see where my code went wrong.

import torch
import torch.nn as nn

nn.CTCLoss(blank=0, zero_infinity=True, reduction=none)

predicted = torch.tensor(
[
  [
    [0., 1., 0., 0., 0.],
    [0., 0., 0., 1., 0.]
  ]
]
).detach().requires_grad_()

expected = torch.tensor([
[1.,3.]
], dtype=torch.long)

predicted_lengths = torch.tensor(predicted.shape[0] * [predicted.shape[1]])
ctc_loss(predicted.permute(1, 0, 2).log_softmax(2), expected, predicted_lengths, torch.tensor([2]))

and it returns

tensor([1.8097], grad_fn=<SWhereBackward>)

I was expecting it to be zero as the prediction matches the target perfectly so if someone could explain why the loss can never be zero or where my mistake is I would really appreciate it

Thanks

tom · February 22, 2021, 8:39pm

The inputs of log_softmax are in log-space, so you’d need to take the log (-inf and 0 instead of 0 and 1 respectively).

jleobernard · February 22, 2021, 8:45pm

It does work on my small example. I’ll try it on my target network and let you know.
Thanks for the help !

jleobernard · February 22, 2021, 9:24pm

I actually have a hard time understanding why log should be called before log_softmax because as the description of CTCLoss states :

Blockquote
Log_probs: Tensor of size (T, N, C)… The logarithmized probabilities of the outputs (e.g. obtained with torch.nn.functional.log_softmax()).

So I was expecting the call to log_softmax to do the log on its own (as described in the doc).

Either way I tried calling .log() on my actual outputs and not this small example and the training produced the same error (predicting only blanks) so I’m back to square one.

SimonW · February 23, 2021, 3:12am

It’s not about logsoftmax vs softmax. Log softmax means softmax first, then log. But softmax expect logits, not probabilities.

jleobernard · February 23, 2021, 6:01am

Thank you for your answer @SimonW.
It means that my training problem does not come from the way I use CTCLoss then as I think I used it right in the first place according to your description. The inputs of the log_softmax came from my last nn.Linear layer.

In the model I have (only kept the relevant parts):

...
self.dense = to_best_device(nn.Linear(in_features=50, out_features=len(characters)))
...
def forward(self, x):
   ...
   return self.dense(x)

And in the training loop I have:

    for i, batch_data in enumerate(dataloader):
        data, labels = batch_data
        optimizer.zero_grad()
        outputs = model(data)
        outputs = outputs.permute(1, 0, 2) # Because I have (batch_size, sequence_length, nb_features)
        bs = len(data)
        curr_loss = loss(outputs.log_softmax(2), labels, bs * [outputs.shape[0]], [get_sentence_length(label) for label in labels])
        curr_loss.backward()
        optimizer.step()