SlideShare une entreprise Scribd logo
1  sur  45
Ana Martinez
Kin Lane

February 2012   M.C. Escher
The problem
Big Bottleneck!
Single POF!
Places Processing
Places Processing
              Source 2
              • Name
              • Address
              • Phone
              • reviews
  Source 1                 Source 3
  • Name                   • Name
  • Address                • Address
  • Phone                  • Phone
  • Images                 • menu



                CityGrid
                 Place
Why is it hard?
Book is to ISBN what Product is to UPC and what Place is to ______


No centrally regulated unique id (tax id is, but not public). Now what?

Spago
176 Canon Dr
Beverly Hills, CA 90210
310-944-3924



R. French Ac & Heating Inc               Ray French Air Conditioning & Heating
                                         Service
2211 martin luther king blvd             2211 MLK boulevard #104
los angeles, CA, 90069                   west Hollywood, CA, 90069
310-358-5903                             866-465-5303
Problem Definition
• Medium size data set
  – 21mill rows, 120 cols

• Time to process: Daily

• Hybrid environment

• Not all data is from same source
Solution




       Normalizer   Matcher   Merger
Normalizer


  Soundex     Metaphone      NYSIIS


        Matching
         Rating     Coverphone
        Approach
Know Your Data
Stop Words
 • The Viper Room           Viper Room

Stemming
 • av               aven           avenu
 • avenue           avn            avnue
Compression
 • county line      county rd      county road

Trunction
 • apt                      unit                 #
Normalizer
         123 Martin Luther King.n

           123 MartinLutherKing.

           123 martinlutherking.

    Martin Luther King | martinlutherking
                  canon column



          the | n | ave | (tokens)
Matching Strategy




   Do what you can on automated fashion and
       complement with manual steps.
Matching Strategy




Exact matching
            Set similarity joins
                                   Custom fuzzy matching
Matching Strategy
• C - Support Vector Machine

• Threashold: 0.996
  – Precision: 98.1%
  – Recall: 97.5%




        84% + manual -> % Match Rate
Merger

Rules:
   Provider truthworthiness
   Voting rules
   New data vs Old data
   Super providers
                              History:
                                         Accepted
                                         Rejected
Example
123 M L K Road Ste 45 123 Martin Luther King Rd       123 Martin L King Drive #45
123 m l k road ste 45      123 martinluther king rd   123 martin l king drive #45
(123) (m) (l) (k) (road)   (123) (martin) (luther)    (123) (martin) (l) (king)
(ste) (45)                 (king) (rd)                (drive) (#) (45)
123 mlk road ste 45        123 martinlutherkingrd     123 martinlking drive # 45
123 mlkrdste 45            123 mlkrd                  123 mlkdr #45
123 mlkrd                  123 mlkrd                  123 mlkdr
123 mlk                    123 mlk                    123 mlk


          MATCH!                     MATCH!                       MATCH!
Findings & Tips
• Domain Knowledge




                     • Automation
                     • Mechanical Turk
                     • Machine Learning

  Run every 2hrs -> Match Rate of %
Solution for Search APIs
Solution for Places API
Performance Results
Updates


          • Hours


          • Real Time
Places Detail – Demo Time!
• Details by ID

  – http://api.citygridmedia.com/content/places/v2/detail?listing_i
    d=11280452&client_ip=123.4.56.78&publisher=test

  – http://api.citygridmedia.com/content/places/v2/detail?public_i
    d=pinks-hot-dogs-los-angeles-
    2&client_ip=123.4.56.78&publisher=test
Improvements
• Shard Listing and Content Data

• Integrate Mongo across all APIs
APIs
        Now we have rich Places API

How do we make developers aware they exist?

How do we get them to successfully integrate?
APIs – Supporting Developer Area
 Common Building Blocks

   • Getting Started
   •Terms of Use
     Publisher Overview
   • Documentation
   • FAQ
   • Terms of Use
APIs – Supporting Developer Area
 Developers Tools
   • Code Samples
   •Terms of Use
     Libraries
   • Mobile SDKs
   • Starter Kits
   • Hackathon Toolkits
   • Partner APIs
APIs – Evangelism - Online
 •   Blogging
 •   Twitter
 •   LinkedIn
 •   Facebook of Use
       Terms
 •   Github
 •   Stack Overflow
 •   Quora
 •   Hacker News
 •   StumbleUpon
 •   Reddit
APIs – Evangelism - Offline


 •   Conferences
 •   Hackathons
      Terms of Use
 •   Meetups
 •   Workshops
APIs – Easy Start + Engage Immediately

•   Testable APIs
•   Self-Service
       Terms of Use
•   Email After Registration
•   Follow on Twitter
•   Follow on LinkedIn
APIs – Feedback Loop + Voice

•   Email Support
•   Forum(s) of Use
        Terms
•   Twitter
•   LinkedIn
APIs – Monetization = Sustainability

•   Local Web Advertising
•   Local Mobile Advertising
       Terms of Use
•   Local Custom Ads
•   Places that Pay
APIs – Evangelize Internally

•   Developer Feedback
•   Roadmap Suggestions
      Terms of Use
•   Landscape Analysis
•   Technology Awareness
•   Trends
•   Internal Hackathons
APIs – Measure & Repeat


  Terms of Use
CityGrid Architecture + API Overview from O'Reilly Strata Conference
CityGrid Architecture + API Overview from O'Reilly Strata Conference

Contenu connexe

Similaire à CityGrid Architecture + API Overview from O'Reilly Strata Conference

Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryRalf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryBayes Nets meetup London
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsMapR Technologies
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationTed Dunning
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElasticsearch
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackElasticsearch
 
TDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
TDC2016SP - Otimização Prematura: a Raíz de Todo o MalTDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
TDC2016SP - Otimização Prematura: a Raíz de Todo o Maltdc-globalcode
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartLucidworks
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data MiningHitesh Mohapatra
 
Revenue Growth through Machine Learning
Revenue Growth through Machine LearningRevenue Growth through Machine Learning
Revenue Growth through Machine LearningDataWorks Summit
 
Summit EU Machine Learning
Summit EU  Machine LearningSummit EU  Machine Learning
Summit EU Machine LearningTed Dunning
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079ibankuk
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsJonas Bonér
 

Similaire à CityGrid Architecture + API Overview from O'Reilly Strata Conference (20)

Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryRalf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in Industry
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
 
TDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
TDC2016SP - Otimização Prematura: a Raíz de Todo o MalTDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
TDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 
Revenue Growth through Machine Learning
Revenue Growth through Machine LearningRevenue Growth through Machine Learning
Revenue Growth through Machine Learning
 
Summit EU Machine Learning
Summit EU  Machine LearningSummit EU  Machine Learning
Summit EU Machine Learning
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 

Dernier

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Dernier (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

CityGrid Architecture + API Overview from O'Reilly Strata Conference

  • 2.
  • 3.
  • 7.
  • 9. Places Processing Source 2 • Name • Address • Phone • reviews Source 1 Source 3 • Name • Name • Address • Address • Phone • Phone • Images • menu CityGrid Place
  • 10. Why is it hard? Book is to ISBN what Product is to UPC and what Place is to ______ No centrally regulated unique id (tax id is, but not public). Now what? Spago 176 Canon Dr Beverly Hills, CA 90210 310-944-3924 R. French Ac & Heating Inc Ray French Air Conditioning & Heating Service 2211 martin luther king blvd 2211 MLK boulevard #104 los angeles, CA, 90069 west Hollywood, CA, 90069 310-358-5903 866-465-5303
  • 11. Problem Definition • Medium size data set – 21mill rows, 120 cols • Time to process: Daily • Hybrid environment • Not all data is from same source
  • 12. Solution Normalizer Matcher Merger
  • 13. Normalizer Soundex Metaphone NYSIIS Matching Rating Coverphone Approach
  • 14. Know Your Data Stop Words • The Viper Room Viper Room Stemming • av aven avenu • avenue avn avnue Compression • county line county rd county road Trunction • apt unit #
  • 15. Normalizer 123 Martin Luther King.n 123 MartinLutherKing. 123 martinlutherking. Martin Luther King | martinlutherking canon column the | n | ave | (tokens)
  • 16. Matching Strategy Do what you can on automated fashion and complement with manual steps.
  • 17. Matching Strategy Exact matching Set similarity joins Custom fuzzy matching
  • 18. Matching Strategy • C - Support Vector Machine • Threashold: 0.996 – Precision: 98.1% – Recall: 97.5% 84% + manual -> % Match Rate
  • 19. Merger Rules: Provider truthworthiness Voting rules New data vs Old data Super providers History: Accepted Rejected
  • 20. Example 123 M L K Road Ste 45 123 Martin Luther King Rd 123 Martin L King Drive #45 123 m l k road ste 45 123 martinluther king rd 123 martin l king drive #45 (123) (m) (l) (k) (road) (123) (martin) (luther) (123) (martin) (l) (king) (ste) (45) (king) (rd) (drive) (#) (45) 123 mlk road ste 45 123 martinlutherkingrd 123 martinlking drive # 45 123 mlkrdste 45 123 mlkrd 123 mlkdr #45 123 mlkrd 123 mlkrd 123 mlkdr 123 mlk 123 mlk 123 mlk MATCH! MATCH! MATCH!
  • 21. Findings & Tips • Domain Knowledge • Automation • Mechanical Turk • Machine Learning Run every 2hrs -> Match Rate of %
  • 22.
  • 23.
  • 25.
  • 27.
  • 28.
  • 30. Updates • Hours • Real Time
  • 31.
  • 32. Places Detail – Demo Time! • Details by ID – http://api.citygridmedia.com/content/places/v2/detail?listing_i d=11280452&client_ip=123.4.56.78&publisher=test – http://api.citygridmedia.com/content/places/v2/detail?public_i d=pinks-hot-dogs-los-angeles- 2&client_ip=123.4.56.78&publisher=test
  • 33. Improvements • Shard Listing and Content Data • Integrate Mongo across all APIs
  • 34. APIs Now we have rich Places API How do we make developers aware they exist? How do we get them to successfully integrate?
  • 35. APIs – Supporting Developer Area Common Building Blocks • Getting Started •Terms of Use Publisher Overview • Documentation • FAQ • Terms of Use
  • 36. APIs – Supporting Developer Area Developers Tools • Code Samples •Terms of Use Libraries • Mobile SDKs • Starter Kits • Hackathon Toolkits • Partner APIs
  • 37. APIs – Evangelism - Online • Blogging • Twitter • LinkedIn • Facebook of Use Terms • Github • Stack Overflow • Quora • Hacker News • StumbleUpon • Reddit
  • 38. APIs – Evangelism - Offline • Conferences • Hackathons Terms of Use • Meetups • Workshops
  • 39. APIs – Easy Start + Engage Immediately • Testable APIs • Self-Service Terms of Use • Email After Registration • Follow on Twitter • Follow on LinkedIn
  • 40. APIs – Feedback Loop + Voice • Email Support • Forum(s) of Use Terms • Twitter • LinkedIn
  • 41. APIs – Monetization = Sustainability • Local Web Advertising • Local Mobile Advertising Terms of Use • Local Custom Ads • Places that Pay
  • 42. APIs – Evangelize Internally • Developer Feedback • Roadmap Suggestions Terms of Use • Landscape Analysis • Technology Awareness • Trends • Internal Hackathons
  • 43. APIs – Measure & Repeat Terms of Use

Notes de l'éditeur

  1. Demo