How to fill the label tensor for CTC loss

weedwind · August 4, 2017, 10:34pm

I am trying to use the CTC loss created by @SeanNaren. But I am not very clear how to fill the tensor for the labels. If I have two sequences of labels seq1 and seq2, they have length L1, L2. should the label tensor be [ seq1[0:L1], seq2[0:L2] ], or should it be [seq1[0], seq2[0], seq1[1], seq2[1], …]?

Also, I noticed that the labels are not zero. Does 0 by default reserved for the blank symbol?

Thanks a lot

SeanNaren · August 5, 2017, 11:17am

Hey! You’re right about the labels, 0 is reserved for the CTC blank symbol.

There is a small example on the use of the loss function here, but in your example the label tensor should be the former, i.e [seq1[0:L1], seq2[0:L2]]. Then give the label lengths via the label_lens parameter!

weedwind · August 9, 2017, 5:24am

Hi, Sean,

Thank you so much. I also looked into deep speech of Baidu. I found that in deep speech, the acoustic unit for the CTC is letters, rather than phones as in conventional ASR. I am wondering is it (using letters) the typical setting with CTC based ASR, or is only to simply the deep speech example?

Thank you so much.

BigBorg · March 15, 2018, 2:12am

So the label tensor is supposed to be the true label without blank token or repeated character. Right? I wonder if, after merging repeated characters and removing spaces, the predicted label length generated from prob is larger than true label, will that be fine? I am trying to use CTC loss for variable length captcha recognition. If I have max captcha length 8. Then should I set prob to have seq_length something like 16 to reserve room for space token?