SlideShare une entreprise Scribd logo
1  sur  23
Crowd Sourcing Reflected
Intelligence Using Search and Big
Data
Ted Dunning
Ivan Provalov

©MapR Technologies - Confidential   1
Ted’s Background

     Academia, Startups
       –   Aptex, MusicMatch, ID Analytics, Veoh
       –   Big data since before big
     Open source
       –   since the dark ages before the internet
       –   Mahout, Zookeeper, Drill
       –   bought the beer at first HUG
     MapR
       –   Chief Application Architect
     Founding member of Apache Drill




©MapR Technologies - Confidential             2
Ivan’s Background

     Sr. Research Engineer at LucidWorks
     NLP, IR
     Agile development




©MapR Technologies - Confidential    3
Agenda

     Intro
     Search Evolution and Search Revolution
     Reflected Intelligence Use Cases
     Building a Next Generation Search and Discovery Platform
       –   MapR
       –   LucidWorks
     1+1=3




©MapR Technologies - Confidential        4
User Interactions With Big Data

                                                    Command    System
                                Data     DFS          Line     Administrator




                                       Key Value      Query
                              Data                             Engineer
                                         Store      Language




                                                    Keyword
                                Data    Index        Search    End User




©MapR Technologies - Confidential               5
Is Search Enough?
    • Keyword search is a commodity
    • Holistic view of the data and the
      user interactions with that data
    • Search, Discovery and Analytics are
      the key to unlocking this view of
      users and data

                                            new pope

                                                       Search




©MapR Technologies - Confidential     6
User Interactions With Big Data

                                                    Command        System
                                Data     DFS          Line         Administrator




                                       Key Value       Query
                              Data                                 Engineer
                                         Store       Language



                                                     Keyword
                                                      Search
                                Data    Index                      End User
                                                     Reflected
                                                    Intelligence




©MapR Technologies - Confidential               7
Search (R)evolution

     Search use leads to search abuse
       –   denormalization frees your mind
       –   scoring is just a sparse matrix multiply

     Lucene/Solr evolution
       –   non free text usages abound
       –   many DB-like features
       –   noSQL before NoSQL was cool
       –   flexible indexing
       –   finite State Transducers FTW!

     Scale

     “This ain’t your father’s relevance anymore”


©MapR Technologies - Confidential                 8
Search, Discovery and Analytics

     Large-scale analysis is key to reflected intelligence
       –   correlation analysis
            • based on queries, clicks, mouse tracks,
                   even explicit feedback
            • produce clusters, trends, topics, SIP’s              Search
       –   start with engineered knowledge,
                 refine with user feedback
     Large-scale discovery features
        encourage experimentation

     Always test, always enrich!                      Analytics            Discovery




©MapR Technologies - Confidential                  9
Social Media Analysis in Telecom

     Correlate mobile traffic analysis with social media analysis
       –   events cause traffic micro-bursts
       –   participants tweet the events ahead of time
     Deploy operations faster to predict outages and better handle
      emergency situations
       –   high cost bandwidth augmentation can be marshaled as the traffic appears
       –   anticipation beats reaction




©MapR Technologies - Confidential            10
Provenance is 80% of Value

     Analysis of social media to determine advertising reach and
      response


     In one case the same untargeted advertising was worth 5x if sold
      with supporting data.




©MapR Technologies - Confidential    11
Claims Analysis

     Goal
       –   Insurance claims processing and analysis
       –   fraud analysis
     Method
       –   Combine free text search with metadata analysis to identify high risk
           activities across the country
       –   Integrate with corporate workflows to detect and fix outliers in customer
           relations
     Results
       –   Questions that took 24-48 hours now take seconds to answer




©MapR Technologies - Confidential            12
Virginia Tech - Help the World

     Grab data around crisis
     Search immediately
     Large-scale analysis enriches data to find
      ways to improve responses and
      understanding
     http://www.ctrnet.net




©MapR Technologies - Confidential      13
Bright Planet - Catch the Bad Guys

     Online Drug Counterfeit detection
     Identify commonly used language indicating counterfeits
       –   you know it when you see it
       –   and you know you have seen it
     Feed to analyst via search-driven application
       –   enrich based on analysts feedback




©MapR Technologies - Confidential              14
Veoh - Cross Recommendations

     Cross recommendation as search
       –   with search used to build cross recommendation!
     Recommend content to people who exhibit certain behaviors
      (clicks, query terms, other)
     (Ab)use of a search engine
       –   but not as a search engine for content
       –   more like a search engine for behavior




©MapR Technologies - Confidential            15
What Platform Do You Need?

     Fast, efficient, scalable search
       –   bulk and near real-time indexing
       –   handle billions of records with sub-second search and faceting


     Large scale, cost effective storage and processing capabilities

     NLP and machine learning tools that scale to enhance discovery
      and analysis


     Integrated log analysis workflows that close the loop between the
      raw data and user interactions


©MapR Technologies - Confidential            16
Reference Architecture
                                                                    Access APIs

                                                                                        View into
                                               Search View                Analytic      numeric/histor
                               1                                          Services      ic data
                                     2
                          Shards           3                N                                            Personalization &   Classification
                                                                                                         Machine Learning    Recommendation
                                                                                                             Services
                                                                         Document         •Documents
                                    Discovery &                                           •Users
                                    Enrichment                             Store
                                                                                          •Logs
                                    Clustering,
                                    classification, NLP,
                                    topic identification,                                                Classification Models
                                    search log analysis,
                                                                                                            In memory
                                    user behavior
                                                                Content Acquisition                         Replicated
                                                                   ETL, batch or near                       Multi-tenant
                                                                   real-time


                                    Data
                    • LucidWorks Search
                      connectors
                    • Push


©MapR Technologies - Confidential                                           17
MapR

     MapR provides the technology leading Hadoop distribution
       –   full eco-system distribution
       –   integrated data platform
       –   complete solution for data integrity
     MapR clusters also provide tight integration with search
      technologies like LucidWorks
       –   integration is key for effective ops




©MapR Technologies - Confidential                 18
LucidWorks

     LucidWorks provides the leading packaging of Apache Lucene and
      Solr
       –   build your own, we support
       –   founded by the most prominent Lucene/Solr experts
     LucidWorks Search
       –   “Solr++”
            •    UI, REST API, MapR connectors, relevance tools, much more
     LucidWorks Big Data
       –   Big Data as a Service
       –   Integrated LucidWorks Search, Hadoop, machine learning with prebuilt
           workflows for many of these tasks




©MapR Technologies - Confidential                  19
LucidWorks Big Data
                                                            API

                        Big Data                         LucidWorks                 Web HDFS

     Inputs                                     Search, Discovery, Analytics                   Mgmt
                                     Analytics Service                Document Service
                                                                                               Admin


                                                                                               Service
                                                    Processing & Storage                       Mgmt


                                                                                               Data
                                                                                               Mgmt


                                         Provisioning, Monitoring & Configuration



 ©MapR Technologies - Confidential                           20
Easy Wins

     Analyze logs from application stored in MapR
     Seamlessly store search indexes in MapR
       –   and feed to Pig, Mahout and others
       –   use mirrors + NFS to directly deploy indexes
     Snapshots make backups a snap
       –   A lot like version control for data
     LucidWorks 2.5.2 easily connects with MapR




©MapR Technologies - Confidential                21
1+1=3



©MapR Technologies - Confidential     22
Learn More

     Talk to Ted (we are hiring)
       @ted_dunning
       tdunning@maprtech.com
     Talk to Ivan (we are hiring)
       @iprovalov
     MapR and Lucid Works
       http://www.mapr.com
       http://www.lucidworks.com
       Hash Tags
       #mapr #hadoopsummit #lucidworks




©MapR Technologies - Confidential        23

Contenu connexe

Tendances

MapR lucidworks joint webinar
MapR lucidworks joint webinarMapR lucidworks joint webinar
MapR lucidworks joint webinarTed Dunning
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Ted Dunning
 
Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Ted Dunning
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07Ted Dunning
 
predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21Ted Dunning
 
Challenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David TuckerChallenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David TuckerMapR Technologies
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
 
Best Practices for Protecting Sensitive Data Across the Big Data Platform
Best Practices for Protecting Sensitive Data Across the Big Data PlatformBest Practices for Protecting Sensitive Data Across the Big Data Platform
Best Practices for Protecting Sensitive Data Across the Big Data PlatformMapR Technologies
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Mathieu Dumoulin
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsMapR Technologies
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 

Tendances (19)

MapR lucidworks joint webinar
MapR lucidworks joint webinarMapR lucidworks joint webinar
MapR lucidworks joint webinar
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
 
Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
Hcj 2013-01-21
Hcj 2013-01-21Hcj 2013-01-21
Hcj 2013-01-21
 
predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21
 
Challenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David TuckerChallenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David Tucker
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
MapR 5.2 Product Update
MapR 5.2 Product UpdateMapR 5.2 Product Update
MapR 5.2 Product Update
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Best Practices for Protecting Sensitive Data Across the Big Data Platform
Best Practices for Protecting Sensitive Data Across the Big Data PlatformBest Practices for Protecting Sensitive Data Across the Big Data Platform
Best Practices for Protecting Sensitive Data Across the Big Data Platform
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 

Similaire à Hadoop summit EU - Crowd Sourcing Reflected Intelligence

Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopDataWorks Summit
 
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrLarge-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrDataWorks Summit
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR Technologies
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and MahoutGrant Ingersoll
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceMapR Technologies
 
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTIONDATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTIONivan provalov
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadooplucenerevolution
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011SEO CAMP
 
Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Umesh Ramalingachar
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London SeminarHortonworks
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Calpont Corporation
 
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Treparel
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOpenStorageSummit
 
Implementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessImplementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessDataWorks Summit
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data European Data Forum
 
Data mining & Decison Trees
Data mining & Decison TreesData mining & Decison Trees
Data mining & Decison TreesSelman Bozkır
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...lucenerevolution
 

Similaire à Hadoop summit EU - Crowd Sourcing Reflected Intelligence (20)

Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over Hadoop
 
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrLarge-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and Mahout
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
 
Data mining
Data miningData mining
Data mining
 
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTIONDATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoop
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
 
Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
 
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal Stern
 
Implementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessImplementing Big Data at the Speed of Business
Implementing Big Data at the Speed of Business
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 
Data mining & Decison Trees
Data mining & Decison TreesData mining & Decison Trees
Data mining & Decison Trees
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 

Plus de Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 

Plus de Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 

Dernier

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Dernier (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Hadoop summit EU - Crowd Sourcing Reflected Intelligence

  • 1. Crowd Sourcing Reflected Intelligence Using Search and Big Data Ted Dunning Ivan Provalov ©MapR Technologies - Confidential 1
  • 2. Ted’s Background  Academia, Startups – Aptex, MusicMatch, ID Analytics, Veoh – Big data since before big  Open source – since the dark ages before the internet – Mahout, Zookeeper, Drill – bought the beer at first HUG  MapR – Chief Application Architect  Founding member of Apache Drill ©MapR Technologies - Confidential 2
  • 3. Ivan’s Background  Sr. Research Engineer at LucidWorks  NLP, IR  Agile development ©MapR Technologies - Confidential 3
  • 4. Agenda  Intro  Search Evolution and Search Revolution  Reflected Intelligence Use Cases  Building a Next Generation Search and Discovery Platform – MapR – LucidWorks  1+1=3 ©MapR Technologies - Confidential 4
  • 5. User Interactions With Big Data Command System Data DFS Line Administrator Key Value Query Data Engineer Store Language Keyword Data Index Search End User ©MapR Technologies - Confidential 5
  • 6. Is Search Enough? • Keyword search is a commodity • Holistic view of the data and the user interactions with that data • Search, Discovery and Analytics are the key to unlocking this view of users and data new pope Search ©MapR Technologies - Confidential 6
  • 7. User Interactions With Big Data Command System Data DFS Line Administrator Key Value Query Data Engineer Store Language Keyword Search Data Index End User Reflected Intelligence ©MapR Technologies - Confidential 7
  • 8. Search (R)evolution  Search use leads to search abuse – denormalization frees your mind – scoring is just a sparse matrix multiply  Lucene/Solr evolution – non free text usages abound – many DB-like features – noSQL before NoSQL was cool – flexible indexing – finite State Transducers FTW!  Scale  “This ain’t your father’s relevance anymore” ©MapR Technologies - Confidential 8
  • 9. Search, Discovery and Analytics  Large-scale analysis is key to reflected intelligence – correlation analysis • based on queries, clicks, mouse tracks, even explicit feedback • produce clusters, trends, topics, SIP’s Search – start with engineered knowledge, refine with user feedback  Large-scale discovery features encourage experimentation  Always test, always enrich! Analytics Discovery ©MapR Technologies - Confidential 9
  • 10. Social Media Analysis in Telecom  Correlate mobile traffic analysis with social media analysis – events cause traffic micro-bursts – participants tweet the events ahead of time  Deploy operations faster to predict outages and better handle emergency situations – high cost bandwidth augmentation can be marshaled as the traffic appears – anticipation beats reaction ©MapR Technologies - Confidential 10
  • 11. Provenance is 80% of Value  Analysis of social media to determine advertising reach and response  In one case the same untargeted advertising was worth 5x if sold with supporting data. ©MapR Technologies - Confidential 11
  • 12. Claims Analysis  Goal – Insurance claims processing and analysis – fraud analysis  Method – Combine free text search with metadata analysis to identify high risk activities across the country – Integrate with corporate workflows to detect and fix outliers in customer relations  Results – Questions that took 24-48 hours now take seconds to answer ©MapR Technologies - Confidential 12
  • 13. Virginia Tech - Help the World  Grab data around crisis  Search immediately  Large-scale analysis enriches data to find ways to improve responses and understanding  http://www.ctrnet.net ©MapR Technologies - Confidential 13
  • 14. Bright Planet - Catch the Bad Guys  Online Drug Counterfeit detection  Identify commonly used language indicating counterfeits – you know it when you see it – and you know you have seen it  Feed to analyst via search-driven application – enrich based on analysts feedback ©MapR Technologies - Confidential 14
  • 15. Veoh - Cross Recommendations  Cross recommendation as search – with search used to build cross recommendation!  Recommend content to people who exhibit certain behaviors (clicks, query terms, other)  (Ab)use of a search engine – but not as a search engine for content – more like a search engine for behavior ©MapR Technologies - Confidential 15
  • 16. What Platform Do You Need?  Fast, efficient, scalable search – bulk and near real-time indexing – handle billions of records with sub-second search and faceting  Large scale, cost effective storage and processing capabilities  NLP and machine learning tools that scale to enhance discovery and analysis  Integrated log analysis workflows that close the loop between the raw data and user interactions ©MapR Technologies - Confidential 16
  • 17. Reference Architecture Access APIs View into Search View Analytic numeric/histor 1 Services ic data 2 Shards 3 N Personalization & Classification Machine Learning Recommendation Services Document •Documents Discovery & •Users Enrichment Store •Logs Clustering, classification, NLP, topic identification, Classification Models search log analysis, In memory user behavior Content Acquisition Replicated ETL, batch or near Multi-tenant real-time Data • LucidWorks Search connectors • Push ©MapR Technologies - Confidential 17
  • 18. MapR  MapR provides the technology leading Hadoop distribution – full eco-system distribution – integrated data platform – complete solution for data integrity  MapR clusters also provide tight integration with search technologies like LucidWorks – integration is key for effective ops ©MapR Technologies - Confidential 18
  • 19. LucidWorks  LucidWorks provides the leading packaging of Apache Lucene and Solr – build your own, we support – founded by the most prominent Lucene/Solr experts  LucidWorks Search – “Solr++” • UI, REST API, MapR connectors, relevance tools, much more  LucidWorks Big Data – Big Data as a Service – Integrated LucidWorks Search, Hadoop, machine learning with prebuilt workflows for many of these tasks ©MapR Technologies - Confidential 19
  • 20. LucidWorks Big Data API Big Data LucidWorks Web HDFS Inputs Search, Discovery, Analytics Mgmt Analytics Service Document Service Admin Service Processing & Storage Mgmt Data Mgmt Provisioning, Monitoring & Configuration ©MapR Technologies - Confidential 20
  • 21. Easy Wins  Analyze logs from application stored in MapR  Seamlessly store search indexes in MapR – and feed to Pig, Mahout and others – use mirrors + NFS to directly deploy indexes  Snapshots make backups a snap – A lot like version control for data  LucidWorks 2.5.2 easily connects with MapR ©MapR Technologies - Confidential 21
  • 22. 1+1=3 ©MapR Technologies - Confidential 22
  • 23. Learn More  Talk to Ted (we are hiring) @ted_dunning tdunning@maprtech.com  Talk to Ivan (we are hiring) @iprovalov  MapR and Lucid Works http://www.mapr.com http://www.lucidworks.com Hash Tags #mapr #hadoopsummit #lucidworks ©MapR Technologies - Confidential 23

Notes de l'éditeur

  1. TED: We can tighten or loosen as necessary.
  2. TED: I think that the agenda needs to go here because it otherwise breaks up some key flow
  3. Search Abuse Can discuss how I started just doing free text, but then a curious thing happened, started to see people using the engine for things like: key/value, denormalized DBs, browsing engines, plagiarism detection, teaching languages, record linkage and much, much moreSearch has added more DB features over the yearsTED: We need to introduce the idea of *REVOLUTION* somewhere in here.
  4. All that revolution is good, but what the heck does this have to do w/ Big Data?
  5. GSI: needs a bit more meat
  6. Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time