SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Search + Big Data:
It’s (still) All About the User
Grant Ingersoll, Chief Scientist – Lucid Imagination
           grant@lucidimagination.com
                 October 19, 2011
Promise and Reality

  “Data is increasingly digital air: the oxygen we
 breathe and the carbon dioxide that we exhale. It
can be a source of both sustenance and pollution.”
            Six Provocations for Big Data
          by Danah Boyd and Kate Crawford



  “The truth is, I spend most of my time trying to
reduce the size of my data so it can be analyzed.”
      Hilary Mason, Chief Scientist, Bitly @ Strata
Pragmatism
Evolution
                           Documents
                           • Models
                           • Feature Selection




                                                 User
                                                 Interaction
       Content
                                                 • Clicks
       Relationships                             • Ratings/
       • Page Rank, etc.                          Reviews
       • Organization                            • Learning to
                                                  Rank
                                                 • Social Graph




                               Queries
                               • Phrases
                               • NLP
Minding the Intersection


                   Search




       Analytics            Discovery
Benefits
§  End users
   •  Better relevance/conversion
   •  Serendipity
   •  Better/faster insight


§  Business:
   •    ROI
   •    Awareness across organization
   •    Enablement
   •    Agility
Needs
§  Fast, efficient, scalable search
§  Large scale, cost effective storage
§  Processing Power:
   •  Large scale distributed for whole data consumption
   •  Streaming/In Memory for real time needs
   •  Ability to learn


§  Willingness to ask questions
The Good News
Search
§  Good scalable, search a given
   •  Talks: Chitouras, Sturlese, Binns, Miller


§  Custom Relevancy via function queries, boosts
§  Explore other relevance models
   •  Talks: Muir, Pugh
   •  Lucene/Solr trunk has pluggable scoring (BM25, etc.)


§  NRT for timeliness
   •  Talks: Busch
Discovery
Facets
  •  Talks: Yonik
  •  Classification, Taxonomy
Clustering
  •  Talk: Frank S.
Suggestions
  •  Auto-suggest, Spelling,
     More Like This,
     Related Searches, search trails
Visualization
Analytics
Analytics for End Users
Offline                         Online
   •    Popularity/Click          •  Trends/Stats
   •    Link Analysis
   •    Search Trails             •  Social/Personal
   •    Recommendations
   •    Spellchecking weights     •  Location
   •    Collocations


                                         STORM
Analytics for Internal Users
Offline                         Online
   •    Top X                     •  Trends
   •    Zero results
   •    MRR, MAP                  •  Operational alerts
   •    User segmentation            (QPS,
   •    Location, conversions        DPS, etc)
   •    Ad hoc Analysis


                                   GIRAPH
What’s Missing?
§  The glue is up to you (us?)
   •    Lucene Index -> Pig/Others
   •    Mahout -> Pig/Others
   •    Mahout -> Lucene/Solr
   •    Logs -> Pig/Others


§  Nice to have:
   •  More in-index functionality (that performs)
         §  Aggregations
         §  Arbitrary stats
         §  Complex Joins
What’s Next?

“I can have all the data I want to have – but I still
  have to communicate it to our players. It has to
  get into their minds. And they have to utilize it. ”
        Brad Stevens, Head Basketball Coach,
     Butler University in Oct. ‘11 McKinsey Quarterly
Thanks!


§  http://www.lucidimagination.com

§  @gsingers

§  grant@lucidimagination.com

§  stump@lucene-eurocon.com
Lucene Ecosystem




               Spark   Storm
              Giraph
Lucene Ecosystem




               Spark   Storm
              Giraph

Contenu connexe

Similaire à Search + Big Data: It's (still) All About the User- Grant Ingersoll

Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
Open Analytics
 
Alla ricerca della User Story perduta
Alla ricerca della User Story perdutaAlla ricerca della User Story perduta
Alla ricerca della User Story perduta
Edoardo Schepis
 
IA breakfast briefing apr12 upload
IA breakfast briefing apr12 uploadIA breakfast briefing apr12 upload
IA breakfast briefing apr12 upload
Ross Philip
 

Similaire à Search + Big Data: It's (still) All About the User- Grant Ingersoll (20)

Duncan product tank
Duncan product tankDuncan product tank
Duncan product tank
 
IxDA UX Research Mentoring Circle - 4. Analysing Data and Presenting Findings
IxDA UX Research Mentoring Circle - 4. Analysing Data and Presenting FindingsIxDA UX Research Mentoring Circle - 4. Analysing Data and Presenting Findings
IxDA UX Research Mentoring Circle - 4. Analysing Data and Presenting Findings
 
Introduction to Information Architecture & Design - SVA Workshop 10/04/14
Introduction to Information Architecture & Design - SVA Workshop 10/04/14Introduction to Information Architecture & Design - SVA Workshop 10/04/14
Introduction to Information Architecture & Design - SVA Workshop 10/04/14
 
Introduction to Information Architecture & Design - 12/06/14
Introduction to Information Architecture & Design - 12/06/14Introduction to Information Architecture & Design - 12/06/14
Introduction to Information Architecture & Design - 12/06/14
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
 
Чираг Шах «Коллективный поиск, взаимодействие пользователей: подходы к изучен...
Чираг Шах «Коллективный поиск, взаимодействие пользователей: подходы к изучен...Чираг Шах «Коллективный поиск, взаимодействие пользователей: подходы к изучен...
Чираг Шах «Коллективный поиск, взаимодействие пользователей: подходы к изучен...
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Introduction to Information Architecture & Design - 3/19/16
Introduction to Information Architecture & Design - 3/19/16Introduction to Information Architecture & Design - 3/19/16
Introduction to Information Architecture & Design - 3/19/16
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Introduction to Information Architecture & Design - 6/25/16
Introduction to Information Architecture & Design - 6/25/16Introduction to Information Architecture & Design - 6/25/16
Introduction to Information Architecture & Design - 6/25/16
 
Warming Up to Analytics
Warming Up to AnalyticsWarming Up to Analytics
Warming Up to Analytics
 
Introduction to Information Architecture & Design - 2/13/16
Introduction to Information Architecture & Design - 2/13/16Introduction to Information Architecture & Design - 2/13/16
Introduction to Information Architecture & Design - 2/13/16
 
Introduction to Information Architecture & Design - 6/24/17
Introduction to Information Architecture & Design - 6/24/17Introduction to Information Architecture & Design - 6/24/17
Introduction to Information Architecture & Design - 6/24/17
 
Share point 2013 the way to go...
Share point 2013 the way to go...Share point 2013 the way to go...
Share point 2013 the way to go...
 
Alla ricerca della User Story perduta
Alla ricerca della User Story perdutaAlla ricerca della User Story perduta
Alla ricerca della User Story perduta
 
Alla ricerca della user story perduta
Alla ricerca della user story perdutaAlla ricerca della user story perduta
Alla ricerca della user story perduta
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013
 
IA breakfast briefing apr12 upload
IA breakfast briefing apr12 uploadIA breakfast briefing apr12 upload
IA breakfast briefing apr12 upload
 
UXD v. Analytics - eMetrics 2013 San Francisco
UXD v. Analytics - eMetrics 2013 San FranciscoUXD v. Analytics - eMetrics 2013 San Francisco
UXD v. Analytics - eMetrics 2013 San Francisco
 
Evaluating search engines
Evaluating search enginesEvaluating search engines
Evaluating search engines
 

Plus de lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

Plus de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Search + Big Data: It's (still) All About the User- Grant Ingersoll

  • 1. Search + Big Data: It’s (still) All About the User Grant Ingersoll, Chief Scientist – Lucid Imagination grant@lucidimagination.com October 19, 2011
  • 2. Promise and Reality “Data is increasingly digital air: the oxygen we breathe and the carbon dioxide that we exhale. It can be a source of both sustenance and pollution.” Six Provocations for Big Data by Danah Boyd and Kate Crawford “The truth is, I spend most of my time trying to reduce the size of my data so it can be analyzed.” Hilary Mason, Chief Scientist, Bitly @ Strata
  • 4. Evolution Documents • Models • Feature Selection User Interaction Content • Clicks Relationships • Ratings/ • Page Rank, etc. Reviews • Organization • Learning to Rank • Social Graph Queries • Phrases • NLP
  • 5. Minding the Intersection Search Analytics Discovery
  • 6. Benefits §  End users •  Better relevance/conversion •  Serendipity •  Better/faster insight §  Business: •  ROI •  Awareness across organization •  Enablement •  Agility
  • 7. Needs §  Fast, efficient, scalable search §  Large scale, cost effective storage §  Processing Power: •  Large scale distributed for whole data consumption •  Streaming/In Memory for real time needs •  Ability to learn §  Willingness to ask questions
  • 9. Search §  Good scalable, search a given •  Talks: Chitouras, Sturlese, Binns, Miller §  Custom Relevancy via function queries, boosts §  Explore other relevance models •  Talks: Muir, Pugh •  Lucene/Solr trunk has pluggable scoring (BM25, etc.) §  NRT for timeliness •  Talks: Busch
  • 10. Discovery Facets •  Talks: Yonik •  Classification, Taxonomy Clustering •  Talk: Frank S. Suggestions •  Auto-suggest, Spelling, More Like This, Related Searches, search trails Visualization
  • 12. Analytics for End Users Offline Online •  Popularity/Click •  Trends/Stats •  Link Analysis •  Search Trails •  Social/Personal •  Recommendations •  Spellchecking weights •  Location •  Collocations STORM
  • 13. Analytics for Internal Users Offline Online •  Top X •  Trends •  Zero results •  MRR, MAP •  Operational alerts •  User segmentation (QPS, •  Location, conversions DPS, etc) •  Ad hoc Analysis GIRAPH
  • 14. What’s Missing? §  The glue is up to you (us?) •  Lucene Index -> Pig/Others •  Mahout -> Pig/Others •  Mahout -> Lucene/Solr •  Logs -> Pig/Others §  Nice to have: •  More in-index functionality (that performs) §  Aggregations §  Arbitrary stats §  Complex Joins
  • 15. What’s Next? “I can have all the data I want to have – but I still have to communicate it to our players. It has to get into their minds. And they have to utilize it. ” Brad Stevens, Head Basketball Coach, Butler University in Oct. ‘11 McKinsey Quarterly
  • 16. Thanks! §  http://www.lucidimagination.com §  @gsingers §  grant@lucidimagination.com §  stump@lucene-eurocon.com
  • 17. Lucene Ecosystem Spark Storm Giraph
  • 18. Lucene Ecosystem Spark Storm Giraph