SlideShare une entreprise Scribd logo
GHoST CrawlerGoogle Hypothesis{0} Using Statistical Technique
Confirmatory Data Analysis
Business Problem
DUNS Enrichment - Segmentation Pillar
1. Classification of
Accounts PUBLIC
or CORP
• Huge manual effort
• Time consuming
2. Sizing of
Account
• Employee HC
• Total Fund Amount
• Industry Type.
3. Cleansing
The Local
Affinity Data
• Integration manager does not accepts
input due to data quality issues.
GHoST Classify Account - Public or Corporate
Current Method of Classification
Inputs
AAID local name
city
state
country code
Zip code
Input Business
Name to GHoST
Tool
Manual Googling
Effort
Affinity Accounts
Public
Corporate
With reference link
GHoST (Google Hypothesis{0} Using Statistical Technique)
• Business Name
• Pin Code/City
Step1:
Input To Ghost
Crawler
• True, Proceed
• Else Mark For
Translation
Step2: Is
Business Name
In English • If Yes Proceed
• No, Then Flag
• Capture Number
Of Results
Step3: Is
Search Result
Valid
• Cleanse Href
URLs
• Filter Irrelevant
(Instagram/pinter
est/Map)
Step4: Qualify
Valid Links
• Rank Based On
Similarity % To
Business Name
• Save Top 10 URLs
For Reference
Step5: Rank
Top 10 URLs
Captured
• Hit Key Words
(Government/Attorney/S
heriff/City Of)
• Prove Alternative
Hypothesis Strength
• Calculate The Lebanese
Similarity/Distance
Step6: Proving Null
Hypothesis (Ho) -
Text Mining For Key
Words
• Either Public
• Or Corporate
Step7:
Binomial
Classification
GHoST Crawler
Output
Ghost Link
Ghost Classification
Similarity%
SimilarityLink2
Total Results
Input
AAID local name
city
state
country code
Zip code
Crunch Base Site Crawler
Cleansing The Local Affinity Data
Can yu read this massage despite
thehorible sppeling msitakes?
I bet you kan.
Existing APID DB
Has DUNS
Check for
Accurate
Y
Pass to IMN Matches AffinityY
Can be rectified
?
N
Y
Associates for
manual work
NSolution
Smart Combination Pickup
Amlan told me great things you are doing with
WebCrawlers. We will set up a call with you on
best practices to see if this is something we can
leverage for our projects on Unicorn/AMO/etc.
Maggie – Length 6
Maggie – Length 4
Smart Pick
GHoST (Google Hypothesis{0} Using Statistical Technique)
1
Input To Ghost
Crawler
2
Is Business
Name In
English
3
Is Search
Result Valid
4
Qualify Valid
Links
5
Rank Top 10
URLs Captured
6
Proving Null
Hypothesis (Ho) -
Text Mining For Key
Words
7
Binomial
Classification
GHoST Crawler
Output
Ghost Link
Ghost Classification
Similarity%
SimilarityLink2
Total Results
Input
AAID local name
city
state
country code
Zip code
Classification and Sizing of Accounts
Input
AAID local name
city
state
country code
Zip code
Results
1. Expected Results
with 70% Accuracy
2. Reduce Time
3. Reduce manual
effort
• Public
• Corporate
Classification of
Accounts
• HC
• Total Fund Amount
• Industry Type
Company Information
GHoST Crawler
Solution
GHoST (Google Hypothesis{0} Using Statistical Technique)

Contenu connexe

Tendances

Tendances (11)

Using AdWords Scripts to Create Your Own Ad Tech Landscape By Mitch Larson
Using AdWords Scripts to Create Your Own Ad Tech Landscape By Mitch LarsonUsing AdWords Scripts to Create Your Own Ad Tech Landscape By Mitch Larson
Using AdWords Scripts to Create Your Own Ad Tech Landscape By Mitch Larson
 
Data Studio for SEOs - Pint-sized Marketing Meetup 2019
Data Studio for SEOs - Pint-sized Marketing Meetup 2019Data Studio for SEOs - Pint-sized Marketing Meetup 2019
Data Studio for SEOs - Pint-sized Marketing Meetup 2019
 
SMX West 2017 PowerPoint Presentation | BrightLocal: 5 Trends Shaping the Fut...
SMX West 2017 PowerPoint Presentation | BrightLocal: 5 Trends Shaping the Fut...SMX West 2017 PowerPoint Presentation | BrightLocal: 5 Trends Shaping the Fut...
SMX West 2017 PowerPoint Presentation | BrightLocal: 5 Trends Shaping the Fut...
 
Sagit Siegal - All Things DATA 2017
Sagit Siegal - All Things DATA 2017Sagit Siegal - All Things DATA 2017
Sagit Siegal - All Things DATA 2017
 
LSA19: Driving Engagement Directly from the SERP
LSA19: Driving Engagement Directly from the SERP LSA19: Driving Engagement Directly from the SERP
LSA19: Driving Engagement Directly from the SERP
 
How to win at SEO: Sagittarius
How to win at SEO: SagittariusHow to win at SEO: Sagittarius
How to win at SEO: Sagittarius
 
4 Steps of Data Driven Recruiting
4 Steps of Data Driven Recruiting4 Steps of Data Driven Recruiting
4 Steps of Data Driven Recruiting
 
The Journey To Cross-Device Nirvana By Mike Henderson
The Journey To Cross-Device Nirvana By Mike HendersonThe Journey To Cross-Device Nirvana By Mike Henderson
The Journey To Cross-Device Nirvana By Mike Henderson
 
Using Search Data to Influence Organizational Decision Making
Using Search Data to Influence Organizational Decision MakingUsing Search Data to Influence Organizational Decision Making
Using Search Data to Influence Organizational Decision Making
 
Experiment Driven Development
Experiment Driven DevelopmentExperiment Driven Development
Experiment Driven Development
 
Max Prin - MnSearch Summit 2017 - What does technical SEO look like in 2017?
Max Prin - MnSearch Summit 2017 - What does technical SEO look like in 2017?Max Prin - MnSearch Summit 2017 - What does technical SEO look like in 2017?
Max Prin - MnSearch Summit 2017 - What does technical SEO look like in 2017?
 

Similaire à Ghost crawler

Joshua Ziering - The Metrics System
Joshua Ziering - The Metrics SystemJoshua Ziering - The Metrics System
Joshua Ziering - The Metrics System
wcphx
 
Onsite seo from the wizard of moz
Onsite seo from the  wizard of mozOnsite seo from the  wizard of moz
Onsite seo from the wizard of moz
Umberto Tessitore
 

Similaire à Ghost crawler (20)

Joshua Ziering - The Metrics System
Joshua Ziering - The Metrics SystemJoshua Ziering - The Metrics System
Joshua Ziering - The Metrics System
 
"Awesomeness Near Me" - How to win at Local SEO
"Awesomeness Near Me" - How to win at Local SEO"Awesomeness Near Me" - How to win at Local SEO
"Awesomeness Near Me" - How to win at Local SEO
 
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments' SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
 
Factors that affect google rankings
Factors that affect google rankingsFactors that affect google rankings
Factors that affect google rankings
 
Increasing Revenue Through Local SEO & Google Places
Increasing Revenue Through Local SEO & Google PlacesIncreasing Revenue Through Local SEO & Google Places
Increasing Revenue Through Local SEO & Google Places
 
Seo questions for 2013
Seo questions for 2013Seo questions for 2013
Seo questions for 2013
 
Google places
Google placesGoogle places
Google places
 
SEO, PPC and AI in 2023 and Beyond
SEO, PPC and AI in 2023 and BeyondSEO, PPC and AI in 2023 and Beyond
SEO, PPC and AI in 2023 and Beyond
 
Why Web Analytics Fail Marketers (And How to Stop Failing!) - Adam Proehl
Why Web Analytics Fail Marketers (And How to Stop Failing!) - Adam ProehlWhy Web Analytics Fail Marketers (And How to Stop Failing!) - Adam Proehl
Why Web Analytics Fail Marketers (And How to Stop Failing!) - Adam Proehl
 
Google mobile sites exam questions & answers 2017
Google mobile sites exam questions & answers 2017Google mobile sites exam questions & answers 2017
Google mobile sites exam questions & answers 2017
 
NYC Data Driven Business Meetup - 2.7.17
NYC Data Driven Business Meetup - 2.7.17NYC Data Driven Business Meetup - 2.7.17
NYC Data Driven Business Meetup - 2.7.17
 
SEO in a Two Algorithm World
SEO in a Two Algorithm WorldSEO in a Two Algorithm World
SEO in a Two Algorithm World
 
Sebastian Amtage - Beyond Marketing Automation: DMP, CDP, CMP. Who Can Still ...
Sebastian Amtage - Beyond Marketing Automation: DMP, CDP, CMP. Who Can Still ...Sebastian Amtage - Beyond Marketing Automation: DMP, CDP, CMP. Who Can Still ...
Sebastian Amtage - Beyond Marketing Automation: DMP, CDP, CMP. Who Can Still ...
 
100 Ways to Transform Your Business Online
100 Ways to Transform Your Business Online100 Ways to Transform Your Business Online
100 Ways to Transform Your Business Online
 
Onsite seo from the wizard of moz
Onsite seo from the  wizard of mozOnsite seo from the  wizard of moz
Onsite seo from the wizard of moz
 
Rand Fishkin: Two Algorithm World
Rand Fishkin: Two Algorithm WorldRand Fishkin: Two Algorithm World
Rand Fishkin: Two Algorithm World
 
Google Analytics: Stop Wondering And Start Measuring
Google Analytics: Stop Wondering And Start MeasuringGoogle Analytics: Stop Wondering And Start Measuring
Google Analytics: Stop Wondering And Start Measuring
 
Keyword tools.pptx
Keyword tools.pptxKeyword tools.pptx
Keyword tools.pptx
 
Event Websites, Part I: Understanding Search Engine Optimization and Web Anal...
Event Websites, Part I: Understanding Search Engine Optimization and Web Anal...Event Websites, Part I: Understanding Search Engine Optimization and Web Anal...
Event Websites, Part I: Understanding Search Engine Optimization and Web Anal...
 
Google Analytics location data visualised with CARTO & BigQuery
Google Analytics location data visualised with CARTO & BigQueryGoogle Analytics location data visualised with CARTO & BigQuery
Google Analytics location data visualised with CARTO & BigQuery
 

Ghost crawler

  • 1. GHoST CrawlerGoogle Hypothesis{0} Using Statistical Technique Confirmatory Data Analysis
  • 2. Business Problem DUNS Enrichment - Segmentation Pillar 1. Classification of Accounts PUBLIC or CORP • Huge manual effort • Time consuming 2. Sizing of Account • Employee HC • Total Fund Amount • Industry Type. 3. Cleansing The Local Affinity Data • Integration manager does not accepts input due to data quality issues.
  • 3. GHoST Classify Account - Public or Corporate Current Method of Classification Inputs AAID local name city state country code Zip code Input Business Name to GHoST Tool Manual Googling Effort Affinity Accounts Public Corporate With reference link
  • 4. GHoST (Google Hypothesis{0} Using Statistical Technique) • Business Name • Pin Code/City Step1: Input To Ghost Crawler • True, Proceed • Else Mark For Translation Step2: Is Business Name In English • If Yes Proceed • No, Then Flag • Capture Number Of Results Step3: Is Search Result Valid • Cleanse Href URLs • Filter Irrelevant (Instagram/pinter est/Map) Step4: Qualify Valid Links • Rank Based On Similarity % To Business Name • Save Top 10 URLs For Reference Step5: Rank Top 10 URLs Captured • Hit Key Words (Government/Attorney/S heriff/City Of) • Prove Alternative Hypothesis Strength • Calculate The Lebanese Similarity/Distance Step6: Proving Null Hypothesis (Ho) - Text Mining For Key Words • Either Public • Or Corporate Step7: Binomial Classification GHoST Crawler Output Ghost Link Ghost Classification Similarity% SimilarityLink2 Total Results Input AAID local name city state country code Zip code
  • 6. Cleansing The Local Affinity Data Can yu read this massage despite thehorible sppeling msitakes? I bet you kan. Existing APID DB Has DUNS Check for Accurate Y Pass to IMN Matches AffinityY Can be rectified ? N Y Associates for manual work NSolution
  • 7. Smart Combination Pickup Amlan told me great things you are doing with WebCrawlers. We will set up a call with you on best practices to see if this is something we can leverage for our projects on Unicorn/AMO/etc. Maggie – Length 6 Maggie – Length 4 Smart Pick
  • 8.
  • 9. GHoST (Google Hypothesis{0} Using Statistical Technique) 1 Input To Ghost Crawler 2 Is Business Name In English 3 Is Search Result Valid 4 Qualify Valid Links 5 Rank Top 10 URLs Captured 6 Proving Null Hypothesis (Ho) - Text Mining For Key Words 7 Binomial Classification GHoST Crawler Output Ghost Link Ghost Classification Similarity% SimilarityLink2 Total Results Input AAID local name city state country code Zip code
  • 10. Classification and Sizing of Accounts Input AAID local name city state country code Zip code Results 1. Expected Results with 70% Accuracy 2. Reduce Time 3. Reduce manual effort • Public • Corporate Classification of Accounts • HC • Total Fund Amount • Industry Type Company Information GHoST Crawler Solution GHoST (Google Hypothesis{0} Using Statistical Technique)