2. Business Problem
DUNS Enrichment - Segmentation Pillar
1. Classification of
Accounts PUBLIC
or CORP
• Huge manual effort
• Time consuming
2. Sizing of
Account
• Employee HC
• Total Fund Amount
• Industry Type.
3. Cleansing
The Local
Affinity Data
• Integration manager does not accepts
input due to data quality issues.
3. GHoST Classify Account - Public or Corporate
Current Method of Classification
Inputs
AAID local name
city
state
country code
Zip code
Input Business
Name to GHoST
Tool
Manual Googling
Effort
Affinity Accounts
Public
Corporate
With reference link
4. GHoST (Google Hypothesis{0} Using Statistical Technique)
• Business Name
• Pin Code/City
Step1:
Input To Ghost
Crawler
• True, Proceed
• Else Mark For
Translation
Step2: Is
Business Name
In English • If Yes Proceed
• No, Then Flag
• Capture Number
Of Results
Step3: Is
Search Result
Valid
• Cleanse Href
URLs
• Filter Irrelevant
(Instagram/pinter
est/Map)
Step4: Qualify
Valid Links
• Rank Based On
Similarity % To
Business Name
• Save Top 10 URLs
For Reference
Step5: Rank
Top 10 URLs
Captured
• Hit Key Words
(Government/Attorney/S
heriff/City Of)
• Prove Alternative
Hypothesis Strength
• Calculate The Lebanese
Similarity/Distance
Step6: Proving Null
Hypothesis (Ho) -
Text Mining For Key
Words
• Either Public
• Or Corporate
Step7:
Binomial
Classification
GHoST Crawler
Output
Ghost Link
Ghost Classification
Similarity%
SimilarityLink2
Total Results
Input
AAID local name
city
state
country code
Zip code
6. Cleansing The Local Affinity Data
Can yu read this massage despite
thehorible sppeling msitakes?
I bet you kan.
Existing APID DB
Has DUNS
Check for
Accurate
Y
Pass to IMN Matches AffinityY
Can be rectified
?
N
Y
Associates for
manual work
NSolution
7. Smart Combination Pickup
Amlan told me great things you are doing with
WebCrawlers. We will set up a call with you on
best practices to see if this is something we can
leverage for our projects on Unicorn/AMO/etc.
Maggie – Length 6
Maggie – Length 4
Smart Pick
8.
9. GHoST (Google Hypothesis{0} Using Statistical Technique)
1
Input To Ghost
Crawler
2
Is Business
Name In
English
3
Is Search
Result Valid
4
Qualify Valid
Links
5
Rank Top 10
URLs Captured
6
Proving Null
Hypothesis (Ho) -
Text Mining For Key
Words
7
Binomial
Classification
GHoST Crawler
Output
Ghost Link
Ghost Classification
Similarity%
SimilarityLink2
Total Results
Input
AAID local name
city
state
country code
Zip code
10. Classification and Sizing of Accounts
Input
AAID local name
city
state
country code
Zip code
Results
1. Expected Results
with 70% Accuracy
2. Reduce Time
3. Reduce manual
effort
• Public
• Corporate
Classification of
Accounts
• HC
• Total Fund Amount
• Industry Type
Company Information
GHoST Crawler
Solution
GHoST (Google Hypothesis{0} Using Statistical Technique)