SlideShare une entreprise Scribd logo
1  sur  23
Likes and LocationsAdventure in Social Data Mining Gene Chuang – Exec Dir of Social Eng, ATTi Masahji Stewart – Founder, Synctree Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA
Dedication
Background
Social Local Mobile Loco
Why Mine Social and Local Data? Signals to improve user experience Timely and “Placely” Engagement Provide value – save time, save money Opt In, Privacy
Yp.com Infrastructure Ruby on Rails for Web, Login and API Solr/Lucene for Search Hadoop for Data pipeline Hive for Ad Hoc queries on Hadoop Ruby ETL scripts
Oauth 2 Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens Think Valet Key
YP.comLogin/Registration
Login Layer A
Oauth 2 Dance
Semi-Social Search
Social Mining - Extract Extract Script Pull data out of a database (like Oracle), Hive, Files, hit Facebook, or any other source and output JSON data to STDOUT: For example to get count of the total users signed up by day: $ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14 {"day":"2011-02-14","count":891,"total":1328636} {"day":"2011-02-15","count":1088,"total":1329724} {"day":"2011-02-16","count":1016,"total":1330740} {"day":"2011-02-17","count":1359,"total":1332099} {"day":"2011-02-18","count":1143,"total":1333242} {"day":"2011-02-19","count":660,"total":1333902} {"day":"2011-02-20","count":597,"total":1334499} {"day":"2011-02-21","count":874,"total":1335373}
Social Mining - Transform Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT For example, to add ypids to existing facebook likes then filter out location and ypid matching data: $ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_matchypidsypid_match_results id {"name":"SnuggleBunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]} {"name":"AssociateConstruction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"} {"name":"PHBistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"} {"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}
Social Mining - Load Load Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard) For example loading total facebook accounts by day into the web dashboard $ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total
Location Real-Time Fuzzy Matcher FP0 (exact match)     Append LISTING_NAME + ADDRESS + CITY + PHONE     Tokenize, normalize, strip punctuation, and stem     Append tokens FP3 (fuzzy match)     Append LISTING_NAME + ADDRESS + CITY + PHONE     Tokenize, normalize, strip punctuation, and stem     Remove tokens that are less than 2 chars long     Remove upper-case short tokens (i.e., MD, CPA, DDS, etc)     Remove non-phone, short, numerical tokens      Remove stopwords based on top 170 most occurring listing_name tokens     Order tokens alphabetically     Append tokens Example: Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710 FP Method Value  FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai
Social Data Valid Facebook Access Tokens: 14K Total Unique Likes: 300K % Likes with Locations and/or Phones: 19% % Likes mapped to YPID: 38% Total Check-Ins: 530
Social Mining Mother Lode Social Search Local Recommendation Engine Discovery Wall Top 10 List Social e-Commerce Online Presence Management – Social CRM
Questions? genechuang@gmail.com http://www.twitter.com/genechuang http://www.quora.com/Gene-Chuang http://www.linkedin.com/in/genechuang

Contenu connexe

En vedette

Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...Social Care Ireland
 
Multifacet Themes of Diversity
Multifacet Themes of DiversityMultifacet Themes of Diversity
Multifacet Themes of DiversityAbrazil
 
Slide 1
Slide 1Slide 1
Slide 1izadat
 
Things you should know before you build your site
Things you should know before you build your siteThings you should know before you build your site
Things you should know before you build your sitePanu Ausavasereelert
 
ç. Z. kuramı
ç. Z. kuramıç. Z. kuramı
ç. Z. kuramıc_lagan
 
Social media updates oct (comms day)
Social media updates oct (comms day)Social media updates oct (comms day)
Social media updates oct (comms day)Ashleey Leong
 
Evaluation qu's 1&2
Evaluation qu's 1&2Evaluation qu's 1&2
Evaluation qu's 1&2billy-sav
 
Penn State #OERSummit16 Keynote
Penn State #OERSummit16 KeynotePenn State #OERSummit16 Keynote
Penn State #OERSummit16 KeynoteNicole Allen
 
DIPLOMA - young artists 2016
DIPLOMA - young artists 2016DIPLOMA - young artists 2016
DIPLOMA - young artists 2016Silvia Floares
 
2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorksNicole Allen
 
2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost CrisisNicole Allen
 
Classroom Management
Classroom ManagementClassroom Management
Classroom ManagementJane Wolff
 
Estrategias y tecnicas de estudio noviembre 2015
Estrategias y  tecnicas de estudio noviembre 2015Estrategias y  tecnicas de estudio noviembre 2015
Estrategias y tecnicas de estudio noviembre 2015JFCOPGLEZ
 
George Business Consultancy Operating Model
George Business Consultancy Operating ModelGeorge Business Consultancy Operating Model
George Business Consultancy Operating Modelpaulageorge
 

En vedette (20)

Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...
 
Multifacet Themes of Diversity
Multifacet Themes of DiversityMultifacet Themes of Diversity
Multifacet Themes of Diversity
 
VAICIURGIS Dominycas
VAICIURGIS DominycasVAICIURGIS Dominycas
VAICIURGIS Dominycas
 
Slide 1
Slide 1Slide 1
Slide 1
 
Things you should know before you build your site
Things you should know before you build your siteThings you should know before you build your site
Things you should know before you build your site
 
ç. Z. kuramı
ç. Z. kuramıç. Z. kuramı
ç. Z. kuramı
 
Social media updates oct (comms day)
Social media updates oct (comms day)Social media updates oct (comms day)
Social media updates oct (comms day)
 
italien presentation
italien presentationitalien presentation
italien presentation
 
Undrah
UndrahUndrah
Undrah
 
Evaluation qu's 1&2
Evaluation qu's 1&2Evaluation qu's 1&2
Evaluation qu's 1&2
 
Penn State #OERSummit16 Keynote
Penn State #OERSummit16 KeynotePenn State #OERSummit16 Keynote
Penn State #OERSummit16 Keynote
 
DIPLOMA - young artists 2016
DIPLOMA - young artists 2016DIPLOMA - young artists 2016
DIPLOMA - young artists 2016
 
2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks
 
Ficha planificación espacio
Ficha planificación espacioFicha planificación espacio
Ficha planificación espacio
 
2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis
 
эко урок
эко урокэко урок
эко урок
 
Classroom Management
Classroom ManagementClassroom Management
Classroom Management
 
Estrategias y tecnicas de estudio noviembre 2015
Estrategias y  tecnicas de estudio noviembre 2015Estrategias y  tecnicas de estudio noviembre 2015
Estrategias y tecnicas de estudio noviembre 2015
 
Sgp
SgpSgp
Sgp
 
George Business Consultancy Operating Model
George Business Consultancy Operating ModelGeorge Business Consultancy Operating Model
George Business Consultancy Operating Model
 

Similaire à Likes and Locations - Adventure in Social Data Mining

IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...J T "Tom" Johnson
 
Archive It Dlc Oct08
Archive It Dlc Oct08Archive It Dlc Oct08
Archive It Dlc Oct08James Jacobs
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)ibwhite
 
Apache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 SessionApache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 SessionSerge Huber
 
National Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter ResumeNational Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter ResumeWalmart Super Center
 
Lessons Learned - Building YDN
Lessons Learned - Building YDNLessons Learned - Building YDN
Lessons Learned - Building YDNDan Theurer
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planningmkhinke
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban PlanningChris Haller
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthruibwhite
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsVincent Burckhardt
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015StampedeCon
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Mining the Web for Information using Hadoop
Mining the Web for Information using HadoopMining the Web for Information using Hadoop
Mining the Web for Information using HadoopSteve Watt
 
How hackers collate information about employees
How hackers collate information about employees How hackers collate information about employees
How hackers collate information about employees begmohsin
 
Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009David Wallace
 
Scraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap ListScraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap Listadityaverita237
 
Veryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax codingVeryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax codingErnest Semerda
 

Similaire à Likes and Locations - Adventure in Social Data Mining (20)

IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
 
Apache Unomi Project In-depth
Apache Unomi Project In-depthApache Unomi Project In-depth
Apache Unomi Project In-depth
 
Archive It Dlc Oct08
Archive It Dlc Oct08Archive It Dlc Oct08
Archive It Dlc Oct08
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)
 
Apache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 SessionApache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 Session
 
National Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter ResumeNational Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter Resume
 
Lessons Learned - Building YDN
Lessons Learned - Building YDNLessons Learned - Building YDN
Lessons Learned - Building YDN
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
 
Microsoft Flow For Developers
Microsoft Flow For DevelopersMicrosoft Flow For Developers
Microsoft Flow For Developers
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthru
 
SearchMonkey
SearchMonkeySearchMonkey
SearchMonkey
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With Analytics
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Mining the Web for Information using Hadoop
Mining the Web for Information using HadoopMining the Web for Information using Hadoop
Mining the Web for Information using Hadoop
 
How hackers collate information about employees
How hackers collate information about employees How hackers collate information about employees
How hackers collate information about employees
 
Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009
 
Scraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap ListScraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap List
 
Veryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax codingVeryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax coding
 

Dernier

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Dernier (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Likes and Locations - Adventure in Social Data Mining

  • 1. Likes and LocationsAdventure in Social Data Mining Gene Chuang – Exec Dir of Social Eng, ATTi Masahji Stewart – Founder, Synctree Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA
  • 4.
  • 6. Why Mine Social and Local Data? Signals to improve user experience Timely and “Placely” Engagement Provide value – save time, save money Opt In, Privacy
  • 7. Yp.com Infrastructure Ruby on Rails for Web, Login and API Solr/Lucene for Search Hadoop for Data pipeline Hive for Ad Hoc queries on Hadoop Ruby ETL scripts
  • 8. Oauth 2 Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens Think Valet Key
  • 13.
  • 14.
  • 15. Social Mining - Extract Extract Script Pull data out of a database (like Oracle), Hive, Files, hit Facebook, or any other source and output JSON data to STDOUT: For example to get count of the total users signed up by day: $ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14 {"day":"2011-02-14","count":891,"total":1328636} {"day":"2011-02-15","count":1088,"total":1329724} {"day":"2011-02-16","count":1016,"total":1330740} {"day":"2011-02-17","count":1359,"total":1332099} {"day":"2011-02-18","count":1143,"total":1333242} {"day":"2011-02-19","count":660,"total":1333902} {"day":"2011-02-20","count":597,"total":1334499} {"day":"2011-02-21","count":874,"total":1335373}
  • 16. Social Mining - Transform Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT For example, to add ypids to existing facebook likes then filter out location and ypid matching data: $ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_matchypidsypid_match_results id {"name":"SnuggleBunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]} {"name":"AssociateConstruction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"} {"name":"PHBistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"} {"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}
  • 17. Social Mining - Load Load Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard) For example loading total facebook accounts by day into the web dashboard $ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total
  • 18.
  • 19.
  • 20. Location Real-Time Fuzzy Matcher FP0 (exact match) Append LISTING_NAME + ADDRESS + CITY + PHONE Tokenize, normalize, strip punctuation, and stem Append tokens FP3 (fuzzy match) Append LISTING_NAME + ADDRESS + CITY + PHONE Tokenize, normalize, strip punctuation, and stem Remove tokens that are less than 2 chars long Remove upper-case short tokens (i.e., MD, CPA, DDS, etc) Remove non-phone, short, numerical tokens Remove stopwords based on top 170 most occurring listing_name tokens Order tokens alphabetically Append tokens Example: Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710 FP Method Value FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai
  • 21. Social Data Valid Facebook Access Tokens: 14K Total Unique Likes: 300K % Likes with Locations and/or Phones: 19% % Likes mapped to YPID: 38% Total Check-Ins: 530
  • 22. Social Mining Mother Lode Social Search Local Recommendation Engine Discovery Wall Top 10 List Social e-Commerce Online Presence Management – Social CRM
  • 23. Questions? genechuang@gmail.com http://www.twitter.com/genechuang http://www.quora.com/Gene-Chuang http://www.linkedin.com/in/genechuang