OCR processing with deep learning: Apply to Vietnamese documents
1. OCR PROCESSING WITH
DEEP LEARNING: APPLY TO
VIETNAMESE DOCUMENTS
VIET-TRUNG TRAN, ANH PHI NGUYEN, KHUYEN NGUYEN
2. OUTLINE
• OCR overview
• History
• Pipelining
• Deep learning for OCR
• Motivation
• Connectionist temporal classification (CTC) network
• LSTM + CTC for sequence recognition
3. WHAT IS OCR
• Optical character recognition (optical character reader) (OCR) is the
mechanical or electronic conversion of images of typed, handwritten or
printed text into machine-encoded text
4. OCR TYPES
• Optical Character Recognition (OCR)
• Targets typewritten text, one character at a time
• Optical Word Recognition (OWR)
• Typewritten text, one word at a time
• Intelligent Character Recognition (ICR)
• Handwritten print script, one character at a time
• Intelligent Word Recognition (IWR)
• Handwritten, one word at a time
9. PAGE LAYOUT ANALYSIS
Smith, Ray. "Hybrid page layout analysis via tab-stop
detection." Document Analysis and Recognition, 2009. ICDAR'09. 10th
International Conference on. IEEE, 2009.
10. IMAGE LEVEL PAGE LAYOUT ANALYSIS
• Using the morphological processing from Leptonica
• http://www.slideshare.net/versae/javier-de-larosacs9883-5912825
18. OCR CHALLENGES
1. Fonts specifics
Never overcome their ability to understand a limited numbers of fonts and page
formats
2. Character bounding boxes
3. Extracting features unreliable
4. Slow performance
22. MOTIVATION
• Segmentation is difficult for cursive or unconstrained text
• R. Smith, “History of the Tesseract OCR engine: what worked and
what didn’t ,” in DRR XX, San Francisco, USA, Feb. 2013.
• there was not a single method proposed for OCR, that can achieve
very low error rates without using aforementioned sophisticated
post-processing techniques.
23. RESEARCH BREAKTHROUGH
A. Graves, M. Liwicki, S. Fernandez, Bertolami, H. Bunke, and J.
Schmidhuber, “A Novel Connectionist System for Unconstrained
Handwriting Recognition,” IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 31, no. 5, pp. 855–868, May 2008.
26. MOTIVATION
• Real-world sequence learning task
• OCR (Optical character recognition)
• ASR (Automatic speech recognition)
• Requires
• prediction of sequences of labels from noisy, unsegmented input data
• Recurrent neural networks (RNN) can be used for sequence learning, but
ask for
• pre-segmented training data
• post-processing to transform outputs into label sequences
27. CONNECTIONIST TEMPORAL CLASSIFICATION
(CTC)
• Graves, Alex, et al. "Connectionist temporal classification: labelling
unsegmented sequence data with recurrent neural
networks." Proceedings of the 23rd international conference on Machine
learning. ACM, 2006.
• WHAT CTC IS ALL ABOUT?
•a novel method for training RNNs to label
unsegmented sequences directly
34. DYNAMIC TIME WRAPERING
• Because the length of y might differ from (often longer than) l, so the
inference of l from y is actually a dynamic time warping problem.
35.
36.
37. CONNECTIONIST TEMPORAL CLASSIFICATION
• o transform the network outputs into a conditional probability
distribution over label sequences
• A CTC network has a softmax output layer with one more unit than there
are labels in L
• activations of the first |L| units are interpreted as the probabilities of observing the
corresponding labels at particular times
• activation of the extra unit is the probability of observing a ‘blank’, or no label
40. LONG SHORT-TERM MEMORY (LSTM)
• One type of RNN networks
• RNN vanishing gradient problem
• influence of a given input on the hidden layer, and therefore on the network output,
either decays or blows up exponentially as it cycles around the network’s recurrent
connections
• LSTM is designed to address vanishing gradient problem
• An LSTM hidden layer consists of recurrently connected subnets, called
memory blocks
• Each block contains a set of internal units, or cells, whose activation is
controlled by three multiplicative gates: the input gate, forget gate and
output gate
47. REFERENCES - CREDITS
• https://github.com/yiwangbaidu/notes/blob/master/CTC/CTC.pdf
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Ray Smith. Everything you always wanted to know about
Tesseract. Tesseract tutorial @ DAS 2014