SlideShare une entreprise Scribd logo
1  sur  23
Likes and LocationsAdventure in Social Data Mining Gene Chuang – Exec Dir of Social Eng, ATTi Masahji Stewart – Founder, Synctree Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA
Dedication
Background
Social Local Mobile Loco
Why Mine Social and Local Data? Signals to improve user experience Timely and “Placely” Engagement Provide value – save time, save money Opt In, Privacy
Yp.com Infrastructure Ruby on Rails for Web, Login and API Solr/Lucene for Search Hadoop for Data pipeline Hive for Ad Hoc queries on Hadoop Ruby ETL scripts
Oauth 2 Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens Think Valet Key
YP.comLogin/Registration
Login Layer A
Oauth 2 Dance
Semi-Social Search
Social Mining - Extract Extract Script Pull data out of a database (like Oracle), Hive, Files, hit Facebook, or any other source and output JSON data to STDOUT: For example to get count of the total users signed up by day: $ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14 {"day":"2011-02-14","count":891,"total":1328636} {"day":"2011-02-15","count":1088,"total":1329724} {"day":"2011-02-16","count":1016,"total":1330740} {"day":"2011-02-17","count":1359,"total":1332099} {"day":"2011-02-18","count":1143,"total":1333242} {"day":"2011-02-19","count":660,"total":1333902} {"day":"2011-02-20","count":597,"total":1334499} {"day":"2011-02-21","count":874,"total":1335373}
Social Mining - Transform Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT For example, to add ypids to existing facebook likes then filter out location and ypid matching data: $ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_matchypidsypid_match_results id {"name":"SnuggleBunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]} {"name":"AssociateConstruction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"} {"name":"PHBistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"} {"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}
Social Mining - Load Load Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard) For example loading total facebook accounts by day into the web dashboard $ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total
Location Real-Time Fuzzy Matcher FP0 (exact match)     Append LISTING_NAME + ADDRESS + CITY + PHONE     Tokenize, normalize, strip punctuation, and stem     Append tokens FP3 (fuzzy match)     Append LISTING_NAME + ADDRESS + CITY + PHONE     Tokenize, normalize, strip punctuation, and stem     Remove tokens that are less than 2 chars long     Remove upper-case short tokens (i.e., MD, CPA, DDS, etc)     Remove non-phone, short, numerical tokens      Remove stopwords based on top 170 most occurring listing_name tokens     Order tokens alphabetically     Append tokens Example: Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710 FP Method Value  FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai
Social Data Valid Facebook Access Tokens: 14K Total Unique Likes: 300K % Likes with Locations and/or Phones: 19% % Likes mapped to YPID: 38% Total Check-Ins: 530
Social Mining Mother Lode Social Search Local Recommendation Engine Discovery Wall Top 10 List Social e-Commerce Online Presence Management – Social CRM
Questions? genechuang@gmail.com http://www.twitter.com/genechuang http://www.quora.com/Gene-Chuang http://www.linkedin.com/in/genechuang

Contenu connexe

En vedette

Multifacet Themes of Diversity
Multifacet Themes of DiversityMultifacet Themes of Diversity
Multifacet Themes of Diversity
Abrazil
 
Slide 1
Slide 1Slide 1
Slide 1
izadat
 
ç. Z. kuramı
ç. Z. kuramıç. Z. kuramı
ç. Z. kuramı
c_lagan
 
Social media updates oct (comms day)
Social media updates oct (comms day)Social media updates oct (comms day)
Social media updates oct (comms day)
Ashleey Leong
 
2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks
Nicole Allen
 
Classroom Management
Classroom ManagementClassroom Management
Classroom Management
Jane Wolff
 

En vedette (20)

Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...
 
Multifacet Themes of Diversity
Multifacet Themes of DiversityMultifacet Themes of Diversity
Multifacet Themes of Diversity
 
VAICIURGIS Dominycas
VAICIURGIS DominycasVAICIURGIS Dominycas
VAICIURGIS Dominycas
 
Slide 1
Slide 1Slide 1
Slide 1
 
Things you should know before you build your site
Things you should know before you build your siteThings you should know before you build your site
Things you should know before you build your site
 
ç. Z. kuramı
ç. Z. kuramıç. Z. kuramı
ç. Z. kuramı
 
Social media updates oct (comms day)
Social media updates oct (comms day)Social media updates oct (comms day)
Social media updates oct (comms day)
 
italien presentation
italien presentationitalien presentation
italien presentation
 
Undrah
UndrahUndrah
Undrah
 
Evaluation qu's 1&2
Evaluation qu's 1&2Evaluation qu's 1&2
Evaluation qu's 1&2
 
Penn State #OERSummit16 Keynote
Penn State #OERSummit16 KeynotePenn State #OERSummit16 Keynote
Penn State #OERSummit16 Keynote
 
DIPLOMA - young artists 2016
DIPLOMA - young artists 2016DIPLOMA - young artists 2016
DIPLOMA - young artists 2016
 
2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks
 
Ficha planificación espacio
Ficha planificación espacioFicha planificación espacio
Ficha planificación espacio
 
2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis
 
эко урок
эко урокэко урок
эко урок
 
Classroom Management
Classroom ManagementClassroom Management
Classroom Management
 
Estrategias y tecnicas de estudio noviembre 2015
Estrategias y  tecnicas de estudio noviembre 2015Estrategias y  tecnicas de estudio noviembre 2015
Estrategias y tecnicas de estudio noviembre 2015
 
Sgp
SgpSgp
Sgp
 
George Business Consultancy Operating Model
George Business Consultancy Operating ModelGeorge Business Consultancy Operating Model
George Business Consultancy Operating Model
 

Similaire à Likes and Locations - Adventure in Social Data Mining

Apache Unomi Project In-depth
Apache Unomi Project In-depthApache Unomi Project In-depth
Apache Unomi Project In-depth
Jahia Solutions Group
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
mkhinke
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
Chris Haller
 
Mining the Web for Information using Hadoop
Mining the Web for Information using HadoopMining the Web for Information using Hadoop
Mining the Web for Information using Hadoop
Steve Watt
 

Similaire à Likes and Locations - Adventure in Social Data Mining (20)

IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
 
Apache Unomi Project In-depth
Apache Unomi Project In-depthApache Unomi Project In-depth
Apache Unomi Project In-depth
 
Archive It Dlc Oct08
Archive It Dlc Oct08Archive It Dlc Oct08
Archive It Dlc Oct08
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)
 
Apache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 SessionApache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 Session
 
National Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter ResumeNational Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter Resume
 
Lessons Learned - Building YDN
Lessons Learned - Building YDNLessons Learned - Building YDN
Lessons Learned - Building YDN
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
 
Microsoft Flow For Developers
Microsoft Flow For DevelopersMicrosoft Flow For Developers
Microsoft Flow For Developers
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthru
 
SearchMonkey
SearchMonkeySearchMonkey
SearchMonkey
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With Analytics
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Mining the Web for Information using Hadoop
Mining the Web for Information using HadoopMining the Web for Information using Hadoop
Mining the Web for Information using Hadoop
 
How hackers collate information about employees
How hackers collate information about employees How hackers collate information about employees
How hackers collate information about employees
 
Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009
 
Scraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap ListScraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap List
 
Veryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax codingVeryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax coding
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Likes and Locations - Adventure in Social Data Mining

  • 1. Likes and LocationsAdventure in Social Data Mining Gene Chuang – Exec Dir of Social Eng, ATTi Masahji Stewart – Founder, Synctree Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA
  • 4.
  • 6. Why Mine Social and Local Data? Signals to improve user experience Timely and “Placely” Engagement Provide value – save time, save money Opt In, Privacy
  • 7. Yp.com Infrastructure Ruby on Rails for Web, Login and API Solr/Lucene for Search Hadoop for Data pipeline Hive for Ad Hoc queries on Hadoop Ruby ETL scripts
  • 8. Oauth 2 Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens Think Valet Key
  • 13.
  • 14.
  • 15. Social Mining - Extract Extract Script Pull data out of a database (like Oracle), Hive, Files, hit Facebook, or any other source and output JSON data to STDOUT: For example to get count of the total users signed up by day: $ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14 {"day":"2011-02-14","count":891,"total":1328636} {"day":"2011-02-15","count":1088,"total":1329724} {"day":"2011-02-16","count":1016,"total":1330740} {"day":"2011-02-17","count":1359,"total":1332099} {"day":"2011-02-18","count":1143,"total":1333242} {"day":"2011-02-19","count":660,"total":1333902} {"day":"2011-02-20","count":597,"total":1334499} {"day":"2011-02-21","count":874,"total":1335373}
  • 16. Social Mining - Transform Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT For example, to add ypids to existing facebook likes then filter out location and ypid matching data: $ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_matchypidsypid_match_results id {"name":"SnuggleBunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]} {"name":"AssociateConstruction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"} {"name":"PHBistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"} {"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}
  • 17. Social Mining - Load Load Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard) For example loading total facebook accounts by day into the web dashboard $ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total
  • 18.
  • 19.
  • 20. Location Real-Time Fuzzy Matcher FP0 (exact match) Append LISTING_NAME + ADDRESS + CITY + PHONE Tokenize, normalize, strip punctuation, and stem Append tokens FP3 (fuzzy match) Append LISTING_NAME + ADDRESS + CITY + PHONE Tokenize, normalize, strip punctuation, and stem Remove tokens that are less than 2 chars long Remove upper-case short tokens (i.e., MD, CPA, DDS, etc) Remove non-phone, short, numerical tokens Remove stopwords based on top 170 most occurring listing_name tokens Order tokens alphabetically Append tokens Example: Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710 FP Method Value FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai
  • 21. Social Data Valid Facebook Access Tokens: 14K Total Unique Likes: 300K % Likes with Locations and/or Phones: 19% % Likes mapped to YPID: 38% Total Check-Ins: 530
  • 22. Social Mining Mother Lode Social Search Local Recommendation Engine Discovery Wall Top 10 List Social e-Commerce Online Presence Management – Social CRM
  • 23. Questions? genechuang@gmail.com http://www.twitter.com/genechuang http://www.quora.com/Gene-Chuang http://www.linkedin.com/in/genechuang