SlideShare une entreprise Scribd logo
1  sur  49
© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 2
© 2014 MapR Technologies 3 
Who I am 
Ted Dunning, Chief Applications Architect, MapR Technologies 
Email tdunning@mapr.com tdunning@apache.org 
Twitter @Ted_Dunning 
Apache Mahout https://mahout.apache.org/ 
Twitter @ApacheMahout
© 2014 MapR Technologies 4 
The outline 
• The first open source, big data project 
• Another big data project 
• Conclusions
© 2014 MapR Technologies 5 
First: 
An apology for going 
off-script
© 2014 MapR Technologies 6 
Now, the story
© 2014 MapR Technologies 7
In 1866, the top finishers 
in the tea race reached 
London in 99 days, within 
2 hours of each other 
© 2014 MapR Technologies 8
© 2014 MapR Technologies 9
But in 1851, the record 
had been set at 89 days 
by the Flying Cloud 
© 2014 MapR Technologies 10
© 2014 MapR Technologies 11 
The difference was due 
(in part) 
to big data
© 2014 MapR Technologies 12
© 2014 MapR Technologies 13
© 2014 MapR Technologies 14
These charts were free … 
If you donated your data 
© 2014 MapR Technologies 15
But how does this apply 
today? 
© 2014 MapR Technologies 16
© 2014 MapR Technologies 17 
Key Points of Maury’s Work 
• Give to get 
– Give the Abstract Log to captains, get data 
• Data consortium wins 
– Merging data gives pictures nobody else can see 
• Give back 
– Them that gives, also gets 
• But this is just what every data driven web site does! 
– Just 150 years before everybody else
© 2014 MapR Technologies 18 
The Real News in Behavioral Analysis 
• Everybody knows that: 
• You need ensembles of many models to do recommendations 
• You need to use factorization models 
• You predict what you observe 
• (You should predict ratings)
© 2014 MapR Technologies 19 
But … 
none of this is really true
© 2014 MapR Technologies 20 
In fact, 
• Fancy models are rarely useful expenditures of time 
• Factorization can be good, but not much better (if at all) 
• Ratings are disastrously bad data 
• Cross-recommendation and multi-modal recommendations are 
much more interesting 
– Multiple kinds of input are far better than multiple models 
• The UI has a far larger impact than the models 
• The best algorithms combine simplicity with accuracy 
– So simple you can embed them in a search engine
© 2014 MapR Technologies 21 
Here’s how
© 2014 MapR Technologies 22 
CooccurrenCcoeoAcncaulryrseisn ce Analysis
© 2014 MapR Technologies 23 
How Often Do Items Co-occur 
How often do items co-occur?
© 2014 MapR Technologies 24 
Which Co-occurrences are Interesting? 
Which cooccurences are interesting? 
Each row of indicators becomes a field in a 
search engine document
© 2014 MapR Technologies 25 
Recommendations 
Alice got an apple and 
Alice a puppy
© 2014 MapR Technologies 26 
Recommendations 
Alice got an apple and 
Alice a puppy 
Charles Charles got a bicycle
© 2014 MapR Technologies 27 
Recommendations 
Alice got an apple and 
Alice a puppy 
Bob Bob got an apple 
Charles Charles got a bicycle
© 2014 MapR Technologies 28 
Recommendations 
Alice got an apple and 
Alice a puppy 
Bob What else would Bob like? 
Charles Charles got a bicycle
© 2014 MapR Technologies 29 
Recommendations 
Alice got an apple and 
Alice a puppy 
Bob A puppy! 
Charles Charles got a bicycle
© 2014 MapR Technologies 30 
You get the idea of how 
recommenders can work…
By the way, like me, Bob also 
wants a pony… 
© 2014 MapR Technologies 31
© 2014 MapR Technologies 32 
Recommendations 
? 
Alice 
Bob 
Amelia 
Charles 
What if everybody gets a 
pony? 
What else would you recommend for 
new user Amelia?
© 2014 MapR Technologies 33 
Recommendations 
? 
Alice 
Bob 
Amelia 
Charles 
If everybody gets a pony, it’s 
not a very good indicator of 
what to else predict...
© 2014 MapR Technologies 34 
Problems with Raw Co-occurrence 
• Very popular items co-occur with everything or why it’s not very 
helpful to know that everybody wants a pony… 
– Examples: Welcome document; Elevator music 
• Very widespread occurrence is not interesting to generate indicators 
for recommendation 
– Unless you want to offer an item that is constantly desired, such as 
razor blades (or ponies) 
• What we want is anomalous co-occurrence 
– This is the source of interesting indicators of preference on which to 
base recommendation
Overview: Get Useful Indicators from Behaviors 
© 2014 MapR Technologies 35 
1. Use log files to build history matrix of users x items 
– Remember: this history of interactions will be sparse compared to all 
potential combinations 
2. Transform to a co-occurrence matrix of items x items 
3. Look for useful indicators by identifying anomalous co-occurrences to 
make an indicator matrix 
– Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences 
can with confidence be used as indicators of preference 
– ItemSimilarityJob in Apache Mahout uses LLR
Which one is the anomalous co-occurrence? 
A not A 
B 1 0 
not B 0 2 
© 2014 MapR Technologies 36 
A not A 
B 13 1000 
not B 1000 100,000 
A not A 
B 1 0 
not B 0 10,000 
A not A 
B 10 0 
not B 0 100,000
Which one is the anomalous co-occurrence? 
A not A 
0.90 1.95 
B 1 0 
not B 0 2 
© 2014 MapR Technologies 37 
A not A 
B 13 1000 
not B 1000 100,000 
A not A 
4.52 14.3 
B 1 0 
not B 0 10,000 
A not A 
B 10 0 
not B 0 100,000
Collection of Documents: Insert Meta-Data 
© 2014 MapR Technologies 38 
Search 
Technology 
Item 
meta-data 
Ingest easily via NFS 
Document for 
“puppy” id: t4 
title: puppy 
desc: The sweetest little puppy 
ever. 
keywords: puppy, dog, pet
From Indicator Matrix to New Indicator Field 
© 2014 MapR Technologies 39 
✔ 
id: t4 
title: puppy 
desc: The sweetest little puppy 
ever. 
keywords: puppy, dog, pet 
indicators: (t1) 
Solr document 
for “puppy” 
Note: data for the indicator field is added directly to meta-data for a document in Apache 
Solr or Elastic Search index. You don’t need to create a separate index for the indicators.
Going Further: Multi-Modal Recommendation 
© 2014 MapR Technologies 40
Going Further: Multi-Modal Recommendation 
© 2014 MapR Technologies 41
© 2014 MapR Technologies 42 
For example 
• Users enter queries (A) 
– (actor = user, item=query) 
• Users view videos (B) 
– (actor = user, item=video) 
• ATA gives query recommendation 
– “did you mean to ask for” 
• BTB gives video recommendation 
– “you might like these videos”
© 2014 MapR Technologies 43 
The punch-line 
• BTA recommends videos in response to a query 
– (isn’t that a search engine?) 
– (not quite, it doesn’t look at content or meta-data)
© 2014 MapR Technologies 44 
Real-life example 
• Query: “Paco de Lucia” 
• Conventional meta-data search results: 
– “hombres de paco” times 400 
– not much else 
• Recommendation based search: 
– Flamenco guitar and dancers 
– Spanish and classical guitar 
– Van Halen doing a classical/flamenco riff
© 2014 MapR Technologies 45 
Real-life example
© 2014 MapR Technologies 46 
Hypothetical Example 
• Want a navigational ontology? 
• Just put labels on a web page with traffic 
– This gives A = users x label clicks 
• Remember viewing history 
– This gives B = users x items 
• Cross recommend 
– B’A = label to item mapping 
• After several users click, results are whatever users think they 
should be
available for free at 
http://www.mapr.com/practical-machine-learning 
© 2014 MapR Technologies 47 
More Details Available 
available for free at 
http://www.mapr.com/practical-machine-learning
© 2014 MapR Technologies 48 
Who I am 
Ted Dunning, Chief Applications Architect, MapR Technologies 
Email tdunning@mapr.com tdunning@apache.org 
Twitter @Ted_Dunning 
Apache Mahout https://mahout.apache.org/ 
Twitter @ApacheMahout
© 2014 MapR Technologies 49 
Q & A 
Engage with us! 
@mapr maprtech 
jbates@mapr.com 
MapR 
maprtech 
mapr-technologies

Contenu connexe

Tendances

Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkMapR Technologies
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series DatabaseDataWorks Summit
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsMapR Technologies
 

Tendances (20)

Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
 

Similaire à Cognitive computing with big data, high tech and low tech approaches

Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation WorkshopMapR Technologies
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matterDataWorks Summit
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterDataWorks Summit
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopDataWorks Summit
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15MLconf
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsMapR Technologies
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationTed Dunning
 
DH101 2013/2014 course 9 - Crowdsourcing, crowdfunding, Wikipedia, Open Stree...
DH101 2013/2014 course 9 - Crowdsourcing, crowdfunding, Wikipedia, Open Stree...DH101 2013/2014 course 9 - Crowdsourcing, crowdfunding, Wikipedia, Open Stree...
DH101 2013/2014 course 9 - Crowdsourcing, crowdfunding, Wikipedia, Open Stree...Frederic Kaplan
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadooplucenerevolution
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and RecommendationsTed Dunning
 
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersDFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Data science workshop
Data science workshopData science workshop
Data science workshopHortonworks
 

Similaire à Cognitive computing with big data, high tech and low tech approaches (20)

Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation Workshop
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on Hadoop
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
 
DH101 2013/2014 course 9 - Crowdsourcing, crowdfunding, Wikipedia, Open Stree...
DH101 2013/2014 course 9 - Crowdsourcing, crowdfunding, Wikipedia, Open Stree...DH101 2013/2014 course 9 - Crowdsourcing, crowdfunding, Wikipedia, Open Stree...
DH101 2013/2014 course 9 - Crowdsourcing, crowdfunding, Wikipedia, Open Stree...
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoop
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
 
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersDFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout Recommenders
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 

Plus de Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

Plus de Ted Dunning (10)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Cognitive computing with big data, high tech and low tech approaches

  • 1. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
  • 2. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 2
  • 3. © 2014 MapR Technologies 3 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Apache Mahout https://mahout.apache.org/ Twitter @ApacheMahout
  • 4. © 2014 MapR Technologies 4 The outline • The first open source, big data project • Another big data project • Conclusions
  • 5. © 2014 MapR Technologies 5 First: An apology for going off-script
  • 6. © 2014 MapR Technologies 6 Now, the story
  • 7. © 2014 MapR Technologies 7
  • 8. In 1866, the top finishers in the tea race reached London in 99 days, within 2 hours of each other © 2014 MapR Technologies 8
  • 9. © 2014 MapR Technologies 9
  • 10. But in 1851, the record had been set at 89 days by the Flying Cloud © 2014 MapR Technologies 10
  • 11. © 2014 MapR Technologies 11 The difference was due (in part) to big data
  • 12. © 2014 MapR Technologies 12
  • 13. © 2014 MapR Technologies 13
  • 14. © 2014 MapR Technologies 14
  • 15. These charts were free … If you donated your data © 2014 MapR Technologies 15
  • 16. But how does this apply today? © 2014 MapR Technologies 16
  • 17. © 2014 MapR Technologies 17 Key Points of Maury’s Work • Give to get – Give the Abstract Log to captains, get data • Data consortium wins – Merging data gives pictures nobody else can see • Give back – Them that gives, also gets • But this is just what every data driven web site does! – Just 150 years before everybody else
  • 18. © 2014 MapR Technologies 18 The Real News in Behavioral Analysis • Everybody knows that: • You need ensembles of many models to do recommendations • You need to use factorization models • You predict what you observe • (You should predict ratings)
  • 19. © 2014 MapR Technologies 19 But … none of this is really true
  • 20. © 2014 MapR Technologies 20 In fact, • Fancy models are rarely useful expenditures of time • Factorization can be good, but not much better (if at all) • Ratings are disastrously bad data • Cross-recommendation and multi-modal recommendations are much more interesting – Multiple kinds of input are far better than multiple models • The UI has a far larger impact than the models • The best algorithms combine simplicity with accuracy – So simple you can embed them in a search engine
  • 21. © 2014 MapR Technologies 21 Here’s how
  • 22. © 2014 MapR Technologies 22 CooccurrenCcoeoAcncaulryrseisn ce Analysis
  • 23. © 2014 MapR Technologies 23 How Often Do Items Co-occur How often do items co-occur?
  • 24. © 2014 MapR Technologies 24 Which Co-occurrences are Interesting? Which cooccurences are interesting? Each row of indicators becomes a field in a search engine document
  • 25. © 2014 MapR Technologies 25 Recommendations Alice got an apple and Alice a puppy
  • 26. © 2014 MapR Technologies 26 Recommendations Alice got an apple and Alice a puppy Charles Charles got a bicycle
  • 27. © 2014 MapR Technologies 27 Recommendations Alice got an apple and Alice a puppy Bob Bob got an apple Charles Charles got a bicycle
  • 28. © 2014 MapR Technologies 28 Recommendations Alice got an apple and Alice a puppy Bob What else would Bob like? Charles Charles got a bicycle
  • 29. © 2014 MapR Technologies 29 Recommendations Alice got an apple and Alice a puppy Bob A puppy! Charles Charles got a bicycle
  • 30. © 2014 MapR Technologies 30 You get the idea of how recommenders can work…
  • 31. By the way, like me, Bob also wants a pony… © 2014 MapR Technologies 31
  • 32. © 2014 MapR Technologies 32 Recommendations ? Alice Bob Amelia Charles What if everybody gets a pony? What else would you recommend for new user Amelia?
  • 33. © 2014 MapR Technologies 33 Recommendations ? Alice Bob Amelia Charles If everybody gets a pony, it’s not a very good indicator of what to else predict...
  • 34. © 2014 MapR Technologies 34 Problems with Raw Co-occurrence • Very popular items co-occur with everything or why it’s not very helpful to know that everybody wants a pony… – Examples: Welcome document; Elevator music • Very widespread occurrence is not interesting to generate indicators for recommendation – Unless you want to offer an item that is constantly desired, such as razor blades (or ponies) • What we want is anomalous co-occurrence – This is the source of interesting indicators of preference on which to base recommendation
  • 35. Overview: Get Useful Indicators from Behaviors © 2014 MapR Technologies 35 1. Use log files to build history matrix of users x items – Remember: this history of interactions will be sparse compared to all potential combinations 2. Transform to a co-occurrence matrix of items x items 3. Look for useful indicators by identifying anomalous co-occurrences to make an indicator matrix – Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference – ItemSimilarityJob in Apache Mahout uses LLR
  • 36. Which one is the anomalous co-occurrence? A not A B 1 0 not B 0 2 © 2014 MapR Technologies 36 A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000
  • 37. Which one is the anomalous co-occurrence? A not A 0.90 1.95 B 1 0 not B 0 2 © 2014 MapR Technologies 37 A not A B 13 1000 not B 1000 100,000 A not A 4.52 14.3 B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000
  • 38. Collection of Documents: Insert Meta-Data © 2014 MapR Technologies 38 Search Technology Item meta-data Ingest easily via NFS Document for “puppy” id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet
  • 39. From Indicator Matrix to New Indicator Field © 2014 MapR Technologies 39 ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) Solr document for “puppy” Note: data for the indicator field is added directly to meta-data for a document in Apache Solr or Elastic Search index. You don’t need to create a separate index for the indicators.
  • 40. Going Further: Multi-Modal Recommendation © 2014 MapR Technologies 40
  • 41. Going Further: Multi-Modal Recommendation © 2014 MapR Technologies 41
  • 42. © 2014 MapR Technologies 42 For example • Users enter queries (A) – (actor = user, item=query) • Users view videos (B) – (actor = user, item=video) • ATA gives query recommendation – “did you mean to ask for” • BTB gives video recommendation – “you might like these videos”
  • 43. © 2014 MapR Technologies 43 The punch-line • BTA recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
  • 44. © 2014 MapR Technologies 44 Real-life example • Query: “Paco de Lucia” • Conventional meta-data search results: – “hombres de paco” times 400 – not much else • Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
  • 45. © 2014 MapR Technologies 45 Real-life example
  • 46. © 2014 MapR Technologies 46 Hypothetical Example • Want a navigational ontology? • Just put labels on a web page with traffic – This gives A = users x label clicks • Remember viewing history – This gives B = users x items • Cross recommend – B’A = label to item mapping • After several users click, results are whatever users think they should be
  • 47. available for free at http://www.mapr.com/practical-machine-learning © 2014 MapR Technologies 47 More Details Available available for free at http://www.mapr.com/practical-machine-learning
  • 48. © 2014 MapR Technologies 48 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Apache Mahout https://mahout.apache.org/ Twitter @ApacheMahout
  • 49. © 2014 MapR Technologies 49 Q & A Engage with us! @mapr maprtech jbates@mapr.com MapR maprtech mapr-technologies

Notes de l'éditeur

  1. Mention that the Pony book said “RowSimilarityJob”…
  2. Problem starts here…