SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
1
Text Extraction from Product Images
Rajesh Shreedhar Bhat
Data Scientist @Walmart Labs, Bengaluru
MS CS @ASU, Kaggle Competitions Expert
Agenda
▪ Intro to Text Extraction
▪ Text Detection(TD)
▪ TD Model Architecture
▪ Training data generation
▪ Text Recognition(TR) training data
preparation
▪ CRNN-CTC model for TR
▪ Receptive Fields
▪ CTC decoder and loss
▪ TR Training phase
▪ Other Advanced Techniques
▪ Questions ?
3
Introduction : Text Extraction
4
Text Detection
5
Image Segmentation - Input & Ground Truth
6
Ground Truth Label Generation
7
Text Detection – Model architecture
▪ VGG16 – BN as the backbone
▪ Model has skip connection in decoder
part which is similar to U-Nets.
▪ Output :
▪ Region score
▪ Affinity score - grouping
characters
Ref:Baek, Youngmin, et al. "Character Region
Awareness for Text detection." Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition. 2019.
8
Sample Output
Region Score Affinity Score Region Score Affinity Score
9
Sample Output..
10
Text Recognition
11
Text Recognition – Training Data Preparation
SynthText: image generation engine
for building a large annotated dataset.
15 million images generated with
different font styles, size, color &
varying backgrounds using product
descriptions + open source datasets
Vocabulary: 92 characters
Includes capital + small letters,
numbers and special symbols
12
Text Recognition CRNN CTC model
13
CNN - Receptive Fields
Usage of intermediate layer features in
SSD’s in Object detection tasks.
▪ Receptive field is defined as the region in the input image/space that a
particular CNN’s feature is looking at.
14
CNN features to LSTM
Input Image
(None, 128 * 32 * 1)
Feature map
(None, 1, 31, 512)
Feature map
(None, 1, 31, 512)
SoftMax probabilities for every
time step (i.e. 31), over the
vocabulary.
15
Ground Truth and Output of TR task
16
Hello
Input Image Ground Truth
Output from LSTM model
for 31 timesteps ..
Time step t1 t2 t3 t4 t5 ... …. t27 t28 t29 t30 t31
Prediction H H H e e ... ... l o o o o
Length of Ground truth is 5 which is not equal to length of prediction i.e 31
How to calculate the loss?
NER model – loss: categorical cross entropy
• Do we have labels for
every time steps of
LSTM model in CRNN
setting ?
• Can we use cross
entropy loss?
Answer is: NO!!
17
Mapping of Input to Output
Corresponding Text
Hello
Corresponding Text
Hello
Can we manually align each character to its location in the audio/image?
Yes!! But lot of manual effort is needed in creating training data.
18
CTC to rescue
With just mapping from image to text and not worrying about alignment of each
character to the location in input image, one should be able to train the network.
Merge repeats
Merge repeats
Drop blank character
19
Connectionist Temporal Classification (CTC) Loss
▪ Ground truth for an image --> AB
▪ Vocabulary is { A, B, - }
▪ Let's say we have predictions for 3-time steps from LSTM network (SoftMax
probabilities over vocabulary at t1, t2, t3)
▪ Given that we use CTC decode operation discussed earlier, in which scenarios we
can say output from the model is correct??
20
CTC loss continued ..
t1 t2 t3
A B B
A A B
- A B
A - B
A B -
• Merge
repeats
• Drop blank
character
AB
t1 t2 t3
A 0.8 0.7 0.1
B 0.1 0.1 0.8
- 0.1 0.2 0.1
Ground Truth : AB
Score for one path: AAB = (0.8 * 0.7 * 0.8) and similarly for other paths.
Probability of getting GT AB: = P(ABB) + P(AAB) + P(-AB) + P(A-B) + P(AB-)
Loss : - log( Probability of getting GT )
SoftMax probabilities
21
CTC loss perfect match
t1 t2 t3
A - -
- A -
- - A
- A A
A A -
A - A
A A A
• Merge
repeats
• Drop blank
character
t1 t2 t3
A 1 0 0
B 0 0 0
- 0 1 1
• SoftMax probabilities
Score for one path: A- - = (1 * 1 * 1) and similarly for other paths.
Probability of getting ground truth A: = P(A--) + P(-A-) + P(--A) + P(-AA) + P(AA-) + P(A-A) + P(AAA)
Loss : - log( Probability of getting GT ) = 0
Ground Truth : A
A
22
CTC loss perfect mismatch
• Merge
repeats
• Drop blank
character
A
t1 t2 t3
A 0 0 0
B 1 1 1
- 0 0 0
SoftMax probabilities
Score for one path: A - - = (0 * 0 * 0) and similarly for other paths.
Probability of getting ground truth A: = P(A--) + P(-A-) + P(--A) + P(-AA) + P(AA-) + P(A-A) + P(AAA)
Loss : - log( Probability of getting GT ) = tends to infinity !!
Ground Truth : A
t1 t2 t3
A - -
- A -
- - A
- A A
A A -
A - A
A A A
23
Model Architecture & CTC loss in TF
24
Training Phase
▪ 15 million images ~ 690 GB when loaded into memory!! Given that on an average
images are of the shape (128 * 32 * 3) and dtype is float32.
▪ Usage of Python Generators to load only single batch in memory.
▪ Reducing the training time by using workers, max_queue_size & multi-processing
in .fit_generator in Keras.
▪ Training time ~ 2 hours for single epoch on single P100 GPU machine and
prediction time ~1 sec for batch of 2048 images.
25
Other Advanced Techniques
Attention - OCR Spatial Transformer Network – before text recognition
Ref:Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer
networks." Advances in neural information processing systems. 2015.
26
Code + PPT
https://github.com/rajesh-bhat/spark-ai-summit-2020-text-extraction
27
Questions ??
rsbhat@asu.edu
https://www.linkedin.com/in/rajeshshreedhar
28

Contenu connexe

Tendances

Multi Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsMulti Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsVaibhav Singh
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine LearningDatabricks
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation MethodSHUBHAM GUPTA
 
Animal identification using machine learning techniques
Animal identification using machine learning techniquesAnimal identification using machine learning techniques
Animal identification using machine learning techniquesAboul Ella Hassanien
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningFrancesco Casalegno
 
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxKnowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxNeo4j
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Salah Amean
 
Lung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfLung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfjagan477830
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Analysis by semantic segmentation of Multispectral satellite imagery using de...
Analysis by semantic segmentation of Multispectral satellite imagery using de...Analysis by semantic segmentation of Multispectral satellite imagery using de...
Analysis by semantic segmentation of Multispectral satellite imagery using de...Yogesh S Awate
 
Federated learning
Federated learningFederated learning
Federated learningMindos Cheng
 

Tendances (20)

Multi Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsMulti Task Learning for Recommendation Systems
Multi Task Learning for Recommendation Systems
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
Animal identification using machine learning techniques
Animal identification using machine learning techniquesAnimal identification using machine learning techniques
Animal identification using machine learning techniques
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxKnowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Lung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfLung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdf
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Analysis by semantic segmentation of Multispectral satellite imagery using de...
Analysis by semantic segmentation of Multispectral satellite imagery using de...Analysis by semantic segmentation of Multispectral satellite imagery using de...
Analysis by semantic segmentation of Multispectral satellite imagery using de...
 
Federated learning
Federated learningFederated learning
Federated learning
 

Similaire à Text Extraction from Product Images Using State-of-the-Art Deep Learning Techniques

Slides_for_Spark_AI_TE.pdf
Slides_for_Spark_AI_TE.pdfSlides_for_Spark_AI_TE.pdf
Slides_for_Spark_AI_TE.pdfDhivya594398
 
CS 354 Understanding Color
CS 354 Understanding ColorCS 354 Understanding Color
CS 354 Understanding ColorMark Kilgard
 
image compresson
image compressonimage compresson
image compressonAjay Kumar
 
Image compression
Image compressionImage compression
Image compressionAle Johnsan
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in RIlya Zhbannikov
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdfFrangoCamila
 
cp467_12_lecture14_image compression1.pdf
cp467_12_lecture14_image compression1.pdfcp467_12_lecture14_image compression1.pdf
cp467_12_lecture14_image compression1.pdfshaikmoosa2003
 
CyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfCyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfMohammadAzreeYahaya
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetupJoshua Bae
 
Surrey dl-4
Surrey dl-4Surrey dl-4
Surrey dl-4ozzie73
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Yuichiro Yasui
 
Geometry Batching Using Texture-Arrays
Geometry Batching Using Texture-ArraysGeometry Batching Using Texture-Arrays
Geometry Batching Using Texture-ArraysMatthias Trapp
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.pptAlpha474815
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.pptSagarDR5
 

Similaire à Text Extraction from Product Images Using State-of-the-Art Deep Learning Techniques (20)

Slides_for_Spark_AI_TE.pdf
Slides_for_Spark_AI_TE.pdfSlides_for_Spark_AI_TE.pdf
Slides_for_Spark_AI_TE.pdf
 
CS 354 Understanding Color
CS 354 Understanding ColorCS 354 Understanding Color
CS 354 Understanding Color
 
image compresson
image compressonimage compresson
image compresson
 
Image compression
Image compressionImage compression
Image compression
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
 
Cs gate-2011
Cs gate-2011Cs gate-2011
Cs gate-2011
 
Cs gate-2011
Cs gate-2011Cs gate-2011
Cs gate-2011
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdf
 
cp467_12_lecture14_image compression1.pdf
cp467_12_lecture14_image compression1.pdfcp467_12_lecture14_image compression1.pdf
cp467_12_lecture14_image compression1.pdf
 
CyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfCyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdf
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
 
Kailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptxKailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptx
 
Surrey dl-4
Surrey dl-4Surrey dl-4
Surrey dl-4
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
 
slides_v1
slides_v1slides_v1
slides_v1
 
Geometry Batching Using Texture-Arrays
Geometry Batching Using Texture-ArraysGeometry Batching Using Texture-Arrays
Geometry Batching Using Texture-Arrays
 
testpang
testpangtestpang
testpang
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
defense
defensedefense
defense
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Dernier (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

Text Extraction from Product Images Using State-of-the-Art Deep Learning Techniques

  • 1. 1
  • 2. Text Extraction from Product Images Rajesh Shreedhar Bhat Data Scientist @Walmart Labs, Bengaluru MS CS @ASU, Kaggle Competitions Expert
  • 3. Agenda ▪ Intro to Text Extraction ▪ Text Detection(TD) ▪ TD Model Architecture ▪ Training data generation ▪ Text Recognition(TR) training data preparation ▪ CRNN-CTC model for TR ▪ Receptive Fields ▪ CTC decoder and loss ▪ TR Training phase ▪ Other Advanced Techniques ▪ Questions ? 3
  • 4. Introduction : Text Extraction 4
  • 6. Image Segmentation - Input & Ground Truth 6
  • 7. Ground Truth Label Generation 7
  • 8. Text Detection – Model architecture ▪ VGG16 – BN as the backbone ▪ Model has skip connection in decoder part which is similar to U-Nets. ▪ Output : ▪ Region score ▪ Affinity score - grouping characters Ref:Baek, Youngmin, et al. "Character Region Awareness for Text detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. 8
  • 9. Sample Output Region Score Affinity Score Region Score Affinity Score 9
  • 12. Text Recognition – Training Data Preparation SynthText: image generation engine for building a large annotated dataset. 15 million images generated with different font styles, size, color & varying backgrounds using product descriptions + open source datasets Vocabulary: 92 characters Includes capital + small letters, numbers and special symbols 12
  • 13. Text Recognition CRNN CTC model 13
  • 14. CNN - Receptive Fields Usage of intermediate layer features in SSD’s in Object detection tasks. ▪ Receptive field is defined as the region in the input image/space that a particular CNN’s feature is looking at. 14
  • 15. CNN features to LSTM Input Image (None, 128 * 32 * 1) Feature map (None, 1, 31, 512) Feature map (None, 1, 31, 512) SoftMax probabilities for every time step (i.e. 31), over the vocabulary. 15
  • 16. Ground Truth and Output of TR task 16 Hello Input Image Ground Truth Output from LSTM model for 31 timesteps .. Time step t1 t2 t3 t4 t5 ... …. t27 t28 t29 t30 t31 Prediction H H H e e ... ... l o o o o Length of Ground truth is 5 which is not equal to length of prediction i.e 31
  • 17. How to calculate the loss? NER model – loss: categorical cross entropy • Do we have labels for every time steps of LSTM model in CRNN setting ? • Can we use cross entropy loss? Answer is: NO!! 17
  • 18. Mapping of Input to Output Corresponding Text Hello Corresponding Text Hello Can we manually align each character to its location in the audio/image? Yes!! But lot of manual effort is needed in creating training data. 18
  • 19. CTC to rescue With just mapping from image to text and not worrying about alignment of each character to the location in input image, one should be able to train the network. Merge repeats Merge repeats Drop blank character 19
  • 20. Connectionist Temporal Classification (CTC) Loss ▪ Ground truth for an image --> AB ▪ Vocabulary is { A, B, - } ▪ Let's say we have predictions for 3-time steps from LSTM network (SoftMax probabilities over vocabulary at t1, t2, t3) ▪ Given that we use CTC decode operation discussed earlier, in which scenarios we can say output from the model is correct?? 20
  • 21. CTC loss continued .. t1 t2 t3 A B B A A B - A B A - B A B - • Merge repeats • Drop blank character AB t1 t2 t3 A 0.8 0.7 0.1 B 0.1 0.1 0.8 - 0.1 0.2 0.1 Ground Truth : AB Score for one path: AAB = (0.8 * 0.7 * 0.8) and similarly for other paths. Probability of getting GT AB: = P(ABB) + P(AAB) + P(-AB) + P(A-B) + P(AB-) Loss : - log( Probability of getting GT ) SoftMax probabilities 21
  • 22. CTC loss perfect match t1 t2 t3 A - - - A - - - A - A A A A - A - A A A A • Merge repeats • Drop blank character t1 t2 t3 A 1 0 0 B 0 0 0 - 0 1 1 • SoftMax probabilities Score for one path: A- - = (1 * 1 * 1) and similarly for other paths. Probability of getting ground truth A: = P(A--) + P(-A-) + P(--A) + P(-AA) + P(AA-) + P(A-A) + P(AAA) Loss : - log( Probability of getting GT ) = 0 Ground Truth : A A 22
  • 23. CTC loss perfect mismatch • Merge repeats • Drop blank character A t1 t2 t3 A 0 0 0 B 1 1 1 - 0 0 0 SoftMax probabilities Score for one path: A - - = (0 * 0 * 0) and similarly for other paths. Probability of getting ground truth A: = P(A--) + P(-A-) + P(--A) + P(-AA) + P(AA-) + P(A-A) + P(AAA) Loss : - log( Probability of getting GT ) = tends to infinity !! Ground Truth : A t1 t2 t3 A - - - A - - - A - A A A A - A - A A A A 23
  • 24. Model Architecture & CTC loss in TF 24
  • 25. Training Phase ▪ 15 million images ~ 690 GB when loaded into memory!! Given that on an average images are of the shape (128 * 32 * 3) and dtype is float32. ▪ Usage of Python Generators to load only single batch in memory. ▪ Reducing the training time by using workers, max_queue_size & multi-processing in .fit_generator in Keras. ▪ Training time ~ 2 hours for single epoch on single P100 GPU machine and prediction time ~1 sec for batch of 2048 images. 25
  • 26. Other Advanced Techniques Attention - OCR Spatial Transformer Network – before text recognition Ref:Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances in neural information processing systems. 2015. 26