Understanding lstm in ocr

I am trying to recreate results achieved by this report. I’m using pretrained resnet cnn layers to extract features and then feed them into bidirectional LSTM to recognize captcha generated by python captcha library. So far, it works well for fixed length captcha recognition but not for variable length captcha. I found that ctc is used in another report by standford. But the outcome is worse when I added ctc loss. I wonder is it theoretically possible to train CNN + LSTM to recognize variable length captcha without ctc? Is lstm learning some kind of segmentation algorithm? If so, will it be suitable for captchas with different paddings and spaces between characters?

1 Like