CTCLoss with warp_ctc help


I have been trying to use the CTCLoss function provided with warp_ctc module. I know this is not the right place to ask but help is really appreciated. (Also, is there any official CTC binder with pytorch now?)

            # Segment the image to one row!
            x = images[:, :, row:row+1, :]
            y = labels[:, row]

            # Target size is batch_sizexone
            target_sizes = torch.IntTensor(x.size(0)).zero_()
            target_sizes = Variable(target_sizes, requires_grad=False) + 1
            # sequence length is batch_size x 1
            sizes = Variable(torch.IntTensor(x.size(0)).zero_(), requires_grad=False) + 1

            # Forward + Backward + Optimize
            outputs = model(x)
            print(outputs.size(), y.size(), sizes.size(), target_sizes.size())
            loss = criterion(outputs, y, sizes, target_sizes)

Now the output is:
torch.Size([20, 3]) torch.Size([20]) torch.Size([20]) torch.Size([20])
Variable containing:
[torch.FloatTensor of size 1]

torch.Size([20, 3]) torch.Size([20]) torch.Size([20]) torch.Size([20])
fish: ‘python3 train.py --train_file ~…’ terminated by signal SIGSEGV (Address boundary error)

So, the first iteration went through properly but 2nd one got SIGSEGV. Sometimes it would get the same error during the first iteration. Anything I am doing wrong here?

I figured it out

The expected input size is seq_length x batch x alphabet_size

my seq_length was missing

I found input arguments to CTCLoss to be confusing as well. I think I’ve worked it out, and thus wrote it in some comments. Perhaps others will find the description useful. @SeanNaren, it would be great if you can confirm.

# The CTC loss function computes total CTC loss on a batch of sequences.  Total loss is not
# the equal to the sum of the losses for individual samples.  Not clear why.
#       https://discuss.pytorch.org/t/how-to-fill-the-label-tensor-for-ctc-loss/5801
# ctc_loss(probs, labels, prob_sizes, label_sizes)
# probs
# -----
# Estimated probabilities.
# Tensor of size (seq_len, batch_size, n_alphabet+1). Note that each sample in the batch may have
# a different sequence length, so the seq_len size of the tensor is maximum of all sequence
# lengths in the batch. The tail of short sequences should be padded with zeros. The [0] index
# of the probabilities is reserved for "blanks" which is why the 3rd dimension is of size
# n_alphabet+1.
# labels
# ------
# Ground truth labels.
# A 1-D tensor composed of concatenated sequences of int labels (not one-hot vectors).
# Scalars should range from 1 to n_alphabet.  0 is not used, as that is reserved for blanks.
# For example, if the label sequences for two samples are [1, 2] and [4, 5, 7] then the tensor
# is [1, 2, 4, 5, 7].
# prob_sizes
# ----------
# Sequence lengths of the probabilities.
# A 1-D tensor of ints of length batch_size. The ith value specifies the sequence length of the
# probabilities of the ith sample that are used in computing that sample's CTC loss. Values in the
# probs tensor that extend beyond this length are ignored.
# label_sizes
# ------------
# Sequence lengths of the labels.
# A 1-D tensor of ints of length batch_size. The ith value specifies the sequence length of the
# labels of the ith sample that are used in computing that sample's CTC loss. The length of the
# labels vector should be equal to the cumulative sum of the elements in the label_sizes vector.

I think this is super helpful! I hope pytorch provides full support of CTC in the nn module soon.

It’s great for personal practice but while developing tools for a general audience who are not very familiar with dependencies and requirement, it becomes a tough software engineering decision.