За день користувачі соціальних мереж постять десятки мільйонів різних картинок. За статистикою кожна третя містить текст. Це може бути і свіжий веселий мем і звичайне селфі в футболці, яке містить напис та фото обкладинки нової книжки з невеликим рев'ю. Всі ці картинки можуть містити згадки різних брендів. Тому задача розпізнавання текстів на зображеннях і їх подальший аналіз дуже цікава для social media listening'а. Ми поговоримо про поточні SOTA підходи та як невеликими зусиллями натренувати власну модель для розпізнавання.
Website: https://fwdays.com/en/event/data-science-fwdays-2019/review/ocr-in-the-wild-world-of-social-media
Evgen Terpil "OCR in the Wild World of Social Media"
1. OCR in the Wild World of
Social Media
Yevhen Terpil
YouScan
2. Yevhen Terpil
Head of data science squad
@JenyaTerpil
terpiljenya
YouScan provides real-time monitoring and analytics of brand mentions
on social networks, blogs, forums, review sites and online news.
3. Sentiment analysis
Autocategories
Trends detection
Smart alerts
Logo Recognition
Scence / Object detection
Machine Learning classifier
Deep Learning RNN
Anomaly detection
Dynamic Topic Modeling
Deep Learning CNNs
Deep Learning CNN
OCR
4. 1. Text Detection
2. Text Recognition
3. End to end models
4. How to train your own OCR
Agenda
5. OCR vs OCR in the wild
OCR in the wild world of social media+ +
PREREQUISITES
List of tutorials on
dicte
8. OCR datasets
COCO-Text
word-level using axis-aligned bounding boxes
ICDAR2015, ICDAR2017
word-level using quadrilateral boxes
ICDAR2013
word-level using rectangular boxes
MSRA-TD500
text regions are annotated by rotated rectangles
COCO-Text
9. Types of text detection
1. Character-based - Individual characters are first detected and then grouped into words
2. Word-based - Words are directly hit with the similar manner of general object detection
3. Text-line-based - Text lines are detected and then broken into words
10. Text Detection - TextBoxes
11.2016
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
Predicted boxes:
Default boxes:
• word-based
• inspired by SSD
• use irregular 1*5 convolutional filters
Non-maximum suppression
21. TPS (Thin-Plate-Spline) transformation
Robust Scene Text Recognition with Automatic Rectification
• inspired by Spatial Transformer Network
03.2016
CNN with regression task
for K fiducial points
Transformation matrix:
base fiducial points
22. Text Recognition - RARE
Robust Scene Text Recognition with Automatic Rectification
• use encoder-decoder scheme with attention mechanism
03.2016
Loss function:
- negative log-likehoof
• use TPS transformation
23. Text recognition pipeline
What Is Wrong With Scene Text Recognition Model Comparisons?
04.2019
TPS
None
VGG
ResNet
RCNN
…
RNN
BiLSTM
None
CTC
Attention
+ + +
Transformation Feature extraction Sequence modeling Prediction
31. OCR for Social Media
EAST + CRNN (pretrained)
Text Detection
Pre-trained EAST works pretty well
Text Recognition
We need to train custom recognition model
with wider alphabet
32. Training pipeline
CRNN
Train
8K
TR Data Generator
Synth text
grayscale
Validation
1.4K
labeled images from
social networks
train >100K
Pretrained
CRNN
weights init
texts from
social networks
34. Fail cases
EAST + CRNN (finetuning)
handwriting fonts bad crops
EAST + CRNN (finetuning)
35. Results
EAST + CRNN (finetuning)
Baseline CRNN CRNN (finetuning)
word precision 0.1966 0.4415 0.4893
char precision 0.2220 0.6984 0.7457
Further improvements
• lexicon based predictions
• pretrained language model for decoder
• crops preprocessing