CTCLoss explanation


I have a Text Recognition task at hand (with a CNN backbone and a LSTM for the sequence prediction) and I want to use CTCLoss but there are some things I don’t underastand:

  1. Do I need to insert blank “character” in-between same characters in a word?
  2. I am trying to use the (N,S) format for the Targets as it says here CTCLoss , with padded sequences up to 30 characters (which are essentially 0) but there is some overlap with the default blank character 0 so how do I tackle this matter?
  3. What are the Input_lengths and how does it differ from Target_length ?

Thanks in advance