Problem with expected tensor size of input of ctc_loss function

Hi pytorch community. I am having problems using torch.nn.functional.ctc_loss()

I have the following tensors:

prob_txt_pred:  torch.Size([225, 51, 54])
target_words:  torch.Size([51, 30])
length_input:  torch.Size([51])
target_len_words:  torch.Size([51])

I am giving them to the ctc_loss function in the following way:

nf.ctc_loss(prob_txt_pred, target_words, length_input, target_len_words)

and I am getting an error that says that dimension 1 of the tensor: ‘targets’ (in this case tensor_words) is expected to be at least 225.

The full error says:
RuntimeError: Expected tensor to have size at least 225 at dimension 1, but got size 51 for argument #2 ‘targets’ (while checking arguments for ctc_loss_gpu)

Any idea why?

thanks for your time!

1 Like

Maybe this includes torch bug.

This message caused in aten/src/ATen/native/LossCTC.cpp at line 86

Real message I think is

RuntimeError: Expected tensor to have size at least {‘max(prob.shape[0])’} at dimension 1, but got size {‘length_input[index]’} for argument #2log_probs’ (while checking arguments forctc_loss_gpu)

And error is caused because you set number bigger than prob.shape[0] for length_input.
You need to set number of input length.

3 Likes

CTC loss takesinputs batch first.

Best regards

Thomas

Like @drkw said, this error is wrong and it means the given input lengths are too long. I stumbled on this because I forgot to remove the receptive field from my input lengths.

@drkw proposed a fix:

@nicosoto0 I’m having the same problem. Could you solve it?

The size is determined by you seq length, for example, the size of target_len_words is 51, but each element of target_len_words may be greater than 1, so the target_words size may not be 51.

if the value of target_len_words.sum() == 225, then your target size should be sum(target_lengths) which equals to 225 (one-dimension targets). In your case, two-dimension targets seems right. Maybe it is a bug of torch?

ptrblck Hello, I would be grateful if someone can help me here. I have a model whose high level architecture is
CNN -> Transformer Encoder -> CTC.

When the batch size 2 the CTC loss works (using batch size 2 because of GPU limitations). Here is an example pipeline.

CNN output:
size [2, 168, 1024] # 2 videos, 168 frames/video, 1024 spatial embedding output

ctc_loss:

def ctc_loss(log_probs, targets, input_lengths, target_lengths, blank=0,
             reduction='mean', zero_infinity=False):
    # type: (Tensor, Tensor, Tensor, Tensor, int, str, bool) -> Tensor

    return torch.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, _Reduction.get_enum(reduction),
                          zero_infinity)

log_probs: size [226, 2, 70]
targets: size [2, 18]
input_lengths: size [2], value: tensor([78,78])
target_lengths: size [2]

During training in one of the epochs when the batch size is 1 the runtime error pops up.

CNN output:
size [1, 42, 1024] # 1 video, 42 frames, 1024 spatial embedding output

ctc_loss:

log_probs: size [42, 1, 70]
targets: size [1]
input_lengths: size [1], value: tensor([77])
target_lengths: size [1]

RuntimeError: Expected input_lengths to have value at most 42, but got value 77 (while checking arguments for ctc_loss_gpu)

When the batch size is 2, the input_lengths tensor has value [78, 78] and the ctc loss works but it doesn’t work when batch size is 1. Any idea what’s going on?
Thanks.

In the batch size 2 example you have 226 time steps in log probs and 78 <= 226.
In the batch size 1 example you have 42 time steps in log probs and 77 > 42.
You cannot feed fewer time steps than your input size claims the input has.

Best regards

Thomas

1 Like

Thanks @tom, that was the issue.