Understanding CTCLoss

aatg · November 20, 2023, 3:01pm

I have read several blog articles to get an idea of how CTCLoss works algorithmically, and the PyTorch documentation seems straightforward. All the examples I have seen online conform to my understanding, but I am having trouble getting it to work in practice. Here’s a minimal working example where the losses should be close to zero, because the inputs match the targets.

The unreduced loss when I run this is tensor([5.9605e-07, inf]) instead of both terms being close to zero. An explanation of why the second loss is infinite, and how to correctly use CTCLoss so that both terms are close to zero would be appreciated.

import torch

e = 1e-7
targets = torch.LongTensor([
    [1, 2, 1],
    [1, 1, 1]
])
logprobs = torch.Tensor([
    [[e, 1-2*e, e], [e, e, 1-2*e], [e, 1-2*e, e], [1-2*e, e, e]],
    [[e, 1-2*e, e], [e, 1-2*e, e], [e, 1-2*e, e], [1-2*e, e, e]]
]) # easier to enter in shape (N=2, T=4, C=2+1)
logprobs = torch.log(logprobs)
logprobs = torch.transpose(logprobs, 0, 1) # get to correct shape (T, N, C)

input_lengths = torch.LongTensor([4,4])
target_lengths = torch.LongTensor([3,3])

loss = torch.nn.CTCLoss(blank = 0, reduction='none')
loss(logprobs, targets, input_lengths, target_lengths)

tom · November 22, 2023, 9:12pm

A loss of inf (likelihood of 0) means the input sequence cannot produce the target. Here, you would need to predict at least 1-blank-1-blank-1, which needs T>=5.

Best regards

Thomas

aatg · November 24, 2023, 8:37pm

Thanks for the clarification. Indeed, the following code, where I encode the predicted sequence by inserting blanks between any two consecutive identical characters, gives the expected result of close to zero error: tensor([9.5367e-07, 9.5367e-07, 1.0729e-06])

import torch

e = 1e-7
targets = torch.LongTensor([
    [1, 2, 1, 1],
    [1, 2, 1, 1],
    [2, 2, 2, 1],
])
logprobs = torch.Tensor([ 
    # encoded using blanks 1 2 1 0 1 0
    [[e, 1-2*e, e], [e, e, 1-2*e], [e, 1-2*e, e], [1-2*e, e, e], [e, 1-2*e, e], [1-2*e, e, e]],
    # encoded using blanks 1 2 1 0 1 0
    [[e, 1-2*e, e], [e, e, 1-2*e], [e, 1-2*e, e], [1-2*e, e, e], [e, 1-2*e, e], [1-2*e, e, e]],
    # encoded using blanks 2 0 2 0 2 1
    [[e, e, 1-2*e], [1-2*e, e, e], [e, e, 1-2*e], [1-2*e, e, e], [e, e, 1-2*e], [e, 1-2*e, e]]
])
logprobs = torch.log(logprobs)
logprobs = torch.transpose(logprobs, 0, 1) # get to correct shape (T, N, C)

input_lengths = torch.LongTensor([6,6,6])
target_lengths = torch.LongTensor([4,4,4])

loss = torch.nn.CTCLoss(blank = 0, reduction='none')