model_output of size [T, N, C] and
target of size [N,T]. I won’t have targets of variable length. Is this how I declare the lengths?
op_len = torch.full((N,), T, dtype=torch.long)
target_len = torch.randint(1,T,(N,), dtype=torch.long)
train_loss = ctc_loss(model_output, target, op_len, target_len)
I am asking as the train loss is infinity after first iteration. There is no problem with the model. This happens only with CTCLoss( ). Any idea why?
Having the input size as long as the target size is not a good idea.
If your target has repetitions, it won’t be representable and so the loss is infinite. If your target doesn’t have repetitions, there only is one valid alignment, and you might as well use per-sequence item cross entropy instead of CTC.
I would recommend staring a bit at the CTC article on Distill.pub to get ideas about your modeling with special attention on the blank token aka ϵ. After reading it, take a sequence (using batch size 1 if your want) that produces inf loss and try to manually find a good alignment for CTC loss. Then you’ll have fully understood what is wrong and have achieved CTC-Zen.