CTCLoss explanation

dimou_gk · November 24, 2022, 11:27am

Greetings,

I have a Text Recognition task at hand (with a CNN backbone and a LSTM for the sequence prediction) and I want to use CTCLoss but there are some things I don’t underastand:

Do I need to insert blank “character” in-between same characters in a word?
I am trying to use the (N,S) format for the Targets as it says here CTCLoss , with padded sequences up to 30 characters (which are essentially 0) but there is some overlap with the default blank character 0 so how do I tackle this matter?
What are the Input_lengths and how does it differ from Target_length ?

Thanks in advance