I am trying to use the CTC loss created by @SeanNaren. But I am not very clear how to fill the tensor for the labels. If I have two sequences of labels seq1 and seq2, they have length L1, L2. should the label tensor be [ seq1[0:L1], seq2[0:L2] ], or should it be [seq1, seq2, seq1, seq2, …]?
Also, I noticed that the labels are not zero. Does 0 by default reserved for the blank symbol?
Thanks a lot
Hey! You’re right about the labels, 0 is reserved for the CTC blank symbol.
There is a small example on the use of the loss function here, but in your example the label tensor should be the former, i.e
[seq1[0:L1], seq2[0:L2]]. Then give the label lengths via the
Thank you so much. I also looked into deep speech of Baidu. I found that in deep speech, the acoustic unit for the CTC is letters, rather than phones as in conventional ASR. I am wondering is it (using letters) the typical setting with CTC based ASR, or is only to simply the deep speech example?
Thank you so much.
So the label tensor is supposed to be the true label without blank token or repeated character. Right? I wonder if, after merging repeated characters and removing spaces, the predicted label length generated from prob is larger than true label, will that be fine? I am trying to use CTC loss for variable length captcha recognition. If I have max captcha length 8. Then should I set prob to have seq_length something like 16 to reserve room for space token?