SlideShare a Scribd company logo
1 of 30
Download to read offline
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Shipment Address Classiļ¬cation in Logistics in
the absence of Geolocation Information
Dr. T. Ravindra Babu,
Data Scientist,
Flipkart
August 1, 2015
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Presentation Plan
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Deļ¬nition
Motivation
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Deļ¬nition
Motivation
Problem Deļ¬nition
Typical Operations Scenario at Delivery Hub without a model
Inscan of shipments received from Mother Hub
Manual reading of address; Assign to the Route/FE
Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Deļ¬nition
Motivation
Problem Deļ¬nition
Typical Operations Scenario at Delivery Hub without a model
Inscan of shipments received from Mother Hub
Manual reading of address; Assign to the Route/FE
Sorting and Delivery
Overview of Proposed Solution
Capturing FEsā€™ domain knowledge and modelling around it
Classifying an address to be belonging to a pre-deļ¬ned subarea
Allocation of the shipments to Route/FE based on Machine
Learning based Classiļ¬er
Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Delivery Hub and Subareas
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 diļ¬€erent ways; whiteļ¬eld
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 diļ¬€erent ways; whiteļ¬eld
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 diļ¬€erent ways; whiteļ¬eld
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Some words a speciļ¬c to certain places/states. Examples:
halli, hobli; bawdi, kuan; society; layout; etc.
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
No. of words in an addresses ranges from 4 to 75 leaving few
outliers of more than 100.
Word like Apartments is spelt in 263 diļ¬€erent ways; whiteļ¬eld
24 ways, industrial 25 ways, Bangalore 161 ways, karnataka
70 ways, etc.
Structure in address is lacking even in city like Bangalore.
Few examples.
Some words a speciļ¬c to certain places/states. Examples:
halli, hobli; bawdi, kuan; society; layout; etc.
Addressing Systems across the world: US, Europe, Korea,
Japan; countries like Brazil, and India
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Proposed Model
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Preprocessing
An elaborate preprocessing model was necessary that accounts
for the following.
Retaining only those terms that possibly help classiļ¬cation
(discriminability)
Merging of terms by empirical statistical models as well as
domain knowledge based rules, n-grams, abbreviating, etc.
Developing data dependent dictionaries based on pattern
clustering (Machine Learning) and forming an equivalent set
Preprocessing reduces the vocabulary size by 65% as
measured on a large dataset
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Preprocessing for Data Compaction
Figure: Impact of Preprocessing
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classiļ¬cation -
Address Strings
Sl.No. Address
1 adf6546s54f6sadfsd6dsa4f6sd54f6sd46fasd54sd6f
2 gasdfashagadfasmejastic
3 fdgdf
4 hjsdhaddsdsasdsa
5 dsfadafadsasdfsdafsda
6 hjsdhaddsdsasdsa
7 asd
8 lmļ¬‚vml
9 assasfsafasfsasfsfsafashaphilomena
10 faskjbdasdlkjbsaasd
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classiļ¬cation -
Address Strings-Heatmap
Figure: MonkeyType Addresses
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classiļ¬cation -
Items Bought
Figure: Items bought by such people
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
While writing addresses it is often found that the customer
either inadvertently misses the space or removed during
storage/retrieval
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation of
Compound Words
To a large extent, Addresses are not amenable to English
Dictionaries
While writing addresses it is often found that the customer
either inadvertently misses the space or removed during
storage/retrieval
Separating such compound words
Compute empirical probabilities of words
Assuming conditional independence, if the joint probability of a
compound word is less than the product of the individual
words, separate the words
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modiļ¬ed version of the tree to generate
n-grams
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modiļ¬ed version of the tree to generate
n-grams
Conventional method
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree for
n-gram Generation
Frequent pattern tree is a celebrated approach in mining large
datasets
We implement a modiļ¬ed version of the tree to generate
n-grams
Conventional method
New approach
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Clustering for equivalent set of
words with spell variations - Ex. koramangala, electronics
koramanagala koromangala kormanagala koramnagala
koramangalato kanamangala koramanagla koremangala
koaramangala koramamgala karamangala tkoramangala
kormangalla koramongala koarmangala korammangala
koramangalla koramangale koramanagal
electronice eclectronic elelctronic eelectronic electronica electroincs
electronics electroninc electrinics electroncis electronincs
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing:: Clustering for ... spell variations
- Ex. Bannerghattaroad(61 variations)
bannerghattaroad, bannergattaroad, banerghattaroad, bannerghataroad,
bannerughattaroad, bannarghattaroad, banergattaroad,
banneraghattaroad, bannerghettaroad, bannerugattaroad,
bhannerghattaroad, bennerghattaroad, bannerghttaroad,
bannargattaroad, banarghattaroad, banneghattaroad, banneragattaroad,
bennarghattaroad, baneerghattaroad, bannergettaroad,
banngerghattaroad, banerghataroad, bannerghuttaroad, bannergatharoad,
benerghattaroad, bannerghattaroadto, bannergataroad,
bannergattharoad, banerghettaroad, bannerguttaroad, bannarghataroad,
bannnerghattaroad, bannarghettaroad, banerughattaroad,
bannergahttaroad, bhannerughattaroad, bennergattaroad,
bannerghattroad, bannaraghattaroad, bannerhattaroad,
bannerghatharoad, banneerghattaroad, bannaerghattaroad,
baneergattaroad, bhannergattaroad, bhanerghattaroad,
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Models in Post-processing :: Semi-Supervised Methods
Discussion
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Revisiting The Model
Supervised Classiļ¬cation
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Summary
Novelty
Solution is novel and developed in-house
No similar solution found in the Literature
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
Motivation, Problem Deļ¬nition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Thank You
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G

More Related Content

Viewers also liked

Student information system project
Student information system projectStudent information system project
Student information system project
Rizwan Ashraf
Ā 
M02 Uml Overview
M02 Uml OverviewM02 Uml Overview
M02 Uml Overview
Dang Tuan
Ā 
9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)
Amani Mrisho
Ā 
Student information-system-project-outline
Student information-system-project-outlineStudent information-system-project-outline
Student information-system-project-outline
Amit Panwar
Ā 

Viewers also liked (12)

Online Student Registration System
Online Student Registration SystemOnline Student Registration System
Online Student Registration System
Ā 
Student information system project
Student information system projectStudent information system project
Student information system project
Ā 
Procedure qualification
Procedure qualificationProcedure qualification
Procedure qualification
Ā 
M02 Uml Overview
M02 Uml OverviewM02 Uml Overview
M02 Uml Overview
Ā 
Types of Grading and Reporting System
Types of Grading and Reporting System Types of Grading and Reporting System
Types of Grading and Reporting System
Ā 
Grading system
Grading systemGrading system
Grading system
Ā 
5 Type Of Architecture Design Process
5 Type Of Architecture Design Process 5 Type Of Architecture Design Process
5 Type Of Architecture Design Process
Ā 
9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)
Ā 
Student information-system-project-outline
Student information-system-project-outlineStudent information-system-project-outline
Student information-system-project-outline
Ā 
Course registration system dfd
Course registration system dfdCourse registration system dfd
Course registration system dfd
Ā 
Modeling- Object, Dynamic and Functional
Modeling- Object, Dynamic and FunctionalModeling- Object, Dynamic and Functional
Modeling- Object, Dynamic and Functional
Ā 
Use Case Diagram
Use Case DiagramUse Case Diagram
Use Case Diagram
Ā 

Similar to Shipment address classification in logistics, Ravindra Babu, Flipkart

Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docxRubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
joellemurphey
Ā 
Sharon G Wilder Resume v1
Sharon G Wilder Resume v1Sharon G Wilder Resume v1
Sharon G Wilder Resume v1
Sharon Wilder
Ā 

Similar to Shipment address classification in logistics, Ravindra Babu, Flipkart (20)

Address classification
Address classificationAddress classification
Address classification
Ā 
Vedant Borse
Vedant BorseVedant Borse
Vedant Borse
Ā 
How to Answer Candidate Questions About Your DEI Strategy
How to Answer Candidate Questions About Your DEI StrategyHow to Answer Candidate Questions About Your DEI Strategy
How to Answer Candidate Questions About Your DEI Strategy
Ā 
Big Data in Human Resources
Big Data in Human ResourcesBig Data in Human Resources
Big Data in Human Resources
Ā 
Leaderhip dancefloor weminar
Leaderhip dancefloor weminarLeaderhip dancefloor weminar
Leaderhip dancefloor weminar
Ā 
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
LinkedIn Recruiter Certification: get in, get smart, get certified | Talent C...
Ā 
4 Steps to Become an HR Analytics Champion
4 Steps to Become an HR Analytics Champion4 Steps to Become an HR Analytics Champion
4 Steps to Become an HR Analytics Champion
Ā 
Human Resource Planning PowerPoint Presentation Slides
Human Resource Planning PowerPoint Presentation Slides Human Resource Planning PowerPoint Presentation Slides
Human Resource Planning PowerPoint Presentation Slides
Ā 
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docxRubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Rubric Name Undergraduate Generic Case and SLP Grading Rubric - Nov.docx
Ā 
Break Out of the Training Box with the Six BoxesĀ® Approach
Break Out of the Training Box with the Six BoxesĀ® ApproachBreak Out of the Training Box with the Six BoxesĀ® Approach
Break Out of the Training Box with the Six BoxesĀ® Approach
Ā 
MM Bagali ......HR...... Succession planning......HRM......HRD.......Management
MM Bagali ......HR...... Succession planning......HRM......HRD.......ManagementMM Bagali ......HR...... Succession planning......HRM......HRD.......Management
MM Bagali ......HR...... Succession planning......HRM......HRD.......Management
Ā 
Sharon G Wilder Resume v1
Sharon G Wilder Resume v1Sharon G Wilder Resume v1
Sharon G Wilder Resume v1
Ā 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the pieces
Ā 
Succession Management PowerPoint Presentation Slides
Succession Management PowerPoint Presentation Slides Succession Management PowerPoint Presentation Slides
Succession Management PowerPoint Presentation Slides
Ā 
Make L&D Count - Shape a strong business case for L&D
Make L&D Count - Shape a strong business case for L&DMake L&D Count - Shape a strong business case for L&D
Make L&D Count - Shape a strong business case for L&D
Ā 
Finding Your Path to Value
Finding Your Path to ValueFinding Your Path to Value
Finding Your Path to Value
Ā 
Successful ERP Selection
Successful ERP SelectionSuccessful ERP Selection
Successful ERP Selection
Ā 
Dfwtrn SourceCon2012 recap
Dfwtrn SourceCon2012 recapDfwtrn SourceCon2012 recap
Dfwtrn SourceCon2012 recap
Ā 
Replacement Planning PowerPoint Presentation Slides
Replacement Planning PowerPoint Presentation Slides Replacement Planning PowerPoint Presentation Slides
Replacement Planning PowerPoint Presentation Slides
Ā 
Planning Your Workforce During Turbulent Times
Planning Your Workforce During Turbulent TimesPlanning Your Workforce During Turbulent Times
Planning Your Workforce During Turbulent Times
Ā 

Recently uploaded

Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
amitlee9823
Ā 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
Ā 
Mg Road Call Girls Service: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Banga...
Mg Road Call Girls Service: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Banga...Mg Road Call Girls Service: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Banga...
Mg Road Call Girls Service: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Banga...
amitlee9823
Ā 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Ā 
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
amitlee9823
Ā 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
Ā 
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
amitlee9823
Ā 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
Ā 
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
amitlee9823
Ā 

Recently uploaded (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
Ā 
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Ā 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Ā 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
Ā 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
Ā 
Mg Road Call Girls Service: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Banga...
Mg Road Call Girls Service: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Banga...Mg Road Call Girls Service: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Banga...
Mg Road Call Girls Service: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Banga...
Ā 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
Ā 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Ā 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
Ā 
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Call Girls Bommasandra Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service B...
Ā 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Ā 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Ā 
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Ā 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
Ā 
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Ser...
Ā 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
Ā 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
Ā 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
Ā 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
Ā 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Ā 

Shipment address classification in logistics, Ravindra Babu, Flipkart

  • 1. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Shipment Address Classiļ¬cation in Logistics in the absence of Geolocation Information Dr. T. Ravindra Babu, Data Scientist, Flipkart August 1, 2015 Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 2. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Presentation Plan Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 3. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Motivation and Problem Deļ¬nition Motivation Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 4. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Motivation and Problem Deļ¬nition Motivation Problem Deļ¬nition Typical Operations Scenario at Delivery Hub without a model Inscan of shipments received from Mother Hub Manual reading of address; Assign to the Route/FE Sorting and Delivery Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 5. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Motivation and Problem Deļ¬nition Motivation Problem Deļ¬nition Typical Operations Scenario at Delivery Hub without a model Inscan of shipments received from Mother Hub Manual reading of address; Assign to the Route/FE Sorting and Delivery Overview of Proposed Solution Capturing FEsā€™ domain knowledge and modelling around it Classifying an address to be belonging to a pre-deļ¬ned subarea Allocation of the shipments to Route/FE based on Machine Learning based Classiļ¬er Sorting and Delivery Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 6. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Delivery Hub and Subareas Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 7. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 8. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 diļ¬€erent ways; whiteļ¬eld 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 9. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 diļ¬€erent ways; whiteļ¬eld 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Structure in address is lacking even in city like Bangalore. Few examples. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 10. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 diļ¬€erent ways; whiteļ¬eld 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Structure in address is lacking even in city like Bangalore. Few examples. Some words a speciļ¬c to certain places/states. Examples: halli, hobli; bawdi, kuan; society; layout; etc. Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 11. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Insights into Address Data No. of words in an addresses ranges from 4 to 75 leaving few outliers of more than 100. Word like Apartments is spelt in 263 diļ¬€erent ways; whiteļ¬eld 24 ways, industrial 25 ways, Bangalore 161 ways, karnataka 70 ways, etc. Structure in address is lacking even in city like Bangalore. Few examples. Some words a speciļ¬c to certain places/states. Examples: halli, hobli; bawdi, kuan; society; layout; etc. Addressing Systems across the world: US, Europe, Korea, Japan; countries like Brazil, and India Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 12. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Proposed Model Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 13. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Preprocessing An elaborate preprocessing model was necessary that accounts for the following. Retaining only those terms that possibly help classiļ¬cation (discriminability) Merging of terms by empirical statistical models as well as domain knowledge based rules, n-grams, abbreviating, etc. Developing data dependent dictionaries based on pattern clustering (Machine Learning) and forming an equivalent set Preprocessing reduces the vocabulary size by 65% as measured on a large dataset Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 14. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Preprocessing for Data Compaction Figure: Impact of Preprocessing Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 15. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Fraud Address Classiļ¬cation - Address Strings Sl.No. Address 1 adf6546s54f6sadfsd6dsa4f6sd54f6sd46fasd54sd6f 2 gasdfashagadfasmejastic 3 fdgdf 4 hjsdhaddsdsasdsa 5 dsfadafadsasdfsdafsda 6 hjsdhaddsdsasdsa 7 asd 8 lmļ¬‚vml 9 assasfsafasfsasfsfsafashaphilomena 10 faskjbdasdlkjbsaasd Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 16. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Fraud Address Classiļ¬cation - Address Strings-Heatmap Figure: MonkeyType Addresses Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 17. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Fraud Address Classiļ¬cation - Items Bought Figure: Items bought by such people Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 18. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Probabilistic Separation of Compound Words To a large extent, Addresses are not amenable to English Dictionaries Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 19. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Probabilistic Separation of Compound Words To a large extent, Addresses are not amenable to English Dictionaries While writing addresses it is often found that the customer either inadvertently misses the space or removed during storage/retrieval Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 20. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Probabilistic Separation of Compound Words To a large extent, Addresses are not amenable to English Dictionaries While writing addresses it is often found that the customer either inadvertently misses the space or removed during storage/retrieval Separating such compound words Compute empirical probabilities of words Assuming conditional independence, if the joint probability of a compound word is less than the product of the individual words, separate the words Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 21. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 22. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets We implement a modiļ¬ed version of the tree to generate n-grams Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 23. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets We implement a modiļ¬ed version of the tree to generate n-grams Conventional method Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 24. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing :: Frequent Pattern Tree for n-gram Generation Frequent pattern tree is a celebrated approach in mining large datasets We implement a modiļ¬ed version of the tree to generate n-grams Conventional method New approach Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 25. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing::Clustering for equivalent set of words with spell variations - Ex. koramangala, electronics koramanagala koromangala kormanagala koramnagala koramangalato kanamangala koramanagla koremangala koaramangala koramamgala karamangala tkoramangala kormangalla koramongala koarmangala korammangala koramangalla koramangale koramanagal electronice eclectronic elelctronic eelectronic electronica electroincs electronics electroninc electrinics electroncis electronincs Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 26. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Preprocessing:: Clustering for ... spell variations - Ex. Bannerghattaroad(61 variations) bannerghattaroad, bannergattaroad, banerghattaroad, bannerghataroad, bannerughattaroad, bannarghattaroad, banergattaroad, banneraghattaroad, bannerghettaroad, bannerugattaroad, bhannerghattaroad, bennerghattaroad, bannerghttaroad, bannargattaroad, banarghattaroad, banneghattaroad, banneragattaroad, bennarghattaroad, baneerghattaroad, bannergettaroad, banngerghattaroad, banerghataroad, bannerghuttaroad, bannergatharoad, benerghattaroad, bannerghattaroadto, bannergataroad, bannergattharoad, banerghettaroad, bannerguttaroad, bannarghataroad, bannnerghattaroad, bannarghettaroad, banerughattaroad, bannergahttaroad, bhannerughattaroad, bennergattaroad, bannerghattroad, bannaraghattaroad, bannerhattaroad, bannerghatharoad, banneerghattaroad, bannaerghattaroad, baneergattaroad, bhannergattaroad, bhanerghattaroad, Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 27. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Models in Post-processing :: Semi-Supervised Methods Discussion Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 28. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Revisiting The Model Supervised Classiļ¬cation Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 29. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Summary Novelty Solution is novel and developed in-house No similar solution found in the Literature Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G
  • 30. Motivation, Problem Deļ¬nition and Solution Overview Data Challenges, Modeling, Solutions and Deployment Summary Thank You Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classiļ¬cation in Logistics in the absence of G