I am working on a use case where I want to extract all visible text from any kind of image — including medicine packages, product cartons, documents, and natural scenes.
What I want to build:
A deep learning-based OCR model that takes an image as input and returns all the text written on it.
I have tried:
- CRNN Model (CNN + BiLSTM + CTC loss):
I implemented a basic CRNN using Keras. The model includes:- CNN layers for feature extraction
- BiLSTM layers for sequence modeling
- Dense + CTC loss for character decoding
Any architecture suggestions, research paper links, or open-source code would be helpful