SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
1©MapR Technologies 2013- Confidential
Introduction to Mahout
And How To Build a Recommender
2©MapR Technologies 2013- Confidential
Me, Us
 Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
 MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
 Tonight
Hash tag - #dfwbd #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR
3©MapR Technologies 2013- Confidential
Requested Topic For Tonight
 What is Mahout?
 What makes it different?
 How can big data technology solve impossible problems?
 How is big data affecting the world?
4©MapR Technologies 2013- Confidential
Also
 What is MapR?
 What is MapR doing?
 How does MapR’s technology work?
 How are customers making use of MapR?
 How can anyone make use of MapR to solve problems?
5©MapR Technologies 2013- Confidential
Oh … Also This
 Detailed break-down of a live machine learning system running
with Mahout on MapR
 With code examples
6©MapR Technologies 2013- Confidential
I may have to
summarize
7©MapR Technologies 2013- Confidential
I may have to
summarize
just a bit
8©MapR Technologies 2013- Confidential
Part 1:
5 minutes of math
9©MapR Technologies 2013- Confidential
Part 2:
12 minutes: I want a pony
10©MapR Technologies 2013- Confidential
Part 3:
A working example
11©MapR Technologies 2013- Confidential
What Does Machine Learning Look Like?
12©MapR Technologies 2013- Confidential
What Does Machine Learning Look Like?
A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
A1 A2
é
ë
ù
û
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
r1
r2
é
ë
ê
ê
ù
û
ú
ú
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
r1 = A1
T
A1 A1
T
A2
é
ëê
ù
ûú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
O(κ k d + k3 d) = O(k2 d log n + k3 d) for small
k, high quality
O(κ d log k) or O(d log κ log k) for larger
k, looser quality
But tonight we’re going to show you how to keep it simple yet powerful…
13©MapR Technologies 2013- Confidential
Comparison of Three Main ML Topics
 Recommendation:
– Involves observation of interactions between people taking action (users)
and items for input data to the recommender model
– Goal is to suggest additional appropriate or desirable interactions
– Applications include: movie, music or map-based restaurant choices;
suggesting sale items for e-stores or via cash-register receipts
14©MapR Technologies 2013- Confidential
15©MapR Technologies 2013- Confidential
16©MapR Technologies 2013- Confidential
Part 1:
A bit of math
(the math of bits)
17©MapR Technologies 2013- Confidential
Mahout Math
 Goals are
– basic linear algebra,
– and statistical sampling,
– and good clustering,
– decent speed,
– extensibility,
– especially for sparse data
 But not
– totally badass speed
– comprehensive set of algorithms
– optimization, root finders, quadrature
18©MapR Technologies 2013- Confidential
Matrices and Vectors
 At the core:
– DenseVector, RandomAccessSparseVector
– DenseMatrix, SparseRowMatrix
 Highly composable API
 Important ideas:
– view*, assign and aggregate
– iteration
m.viewDiagonal().assign(v)
19©MapR Technologies 2013- Confidential
Assign? View?
 Why assign?
– Copying is the major cost for naïve matrix packages
– In-place operations critical to reasonable performance
– Many kinds of updates required, so functional style very helpful
 Why view?
– In-place operations often required for blocks, rows, columns or diagonals
– With views, we need #assign + #views methods
– Without views, we need #assign x #views methods
 Synergies
– With both views and assign, many loops become single line
24©MapR Technologies 2013- Confidential
Examples
A =a
A =aB+ b
double alpha; a.assign(alpha);
a.assign(b, Functions.chain(
Functions.plus(beta),
Functions.times(alpha));
26©MapR Technologies 2013- Confidential
More Examples
 The trace of a matrix
 Set diagonal to zero
 Set diagonal to negative of row sums
27©MapR Technologies 2013- Confidential
Examples
 The trace of a matrix
 Set diagonal to zero
 Set diagonal to negative of row sums
m.viewDiagonal().zSum()
28©MapR Technologies 2013- Confidential
Examples
 The trace of a matrix
 Set diagonal to zero
 Set diagonal to negative of row sums
m.viewDiagonal().zSum()
m.viewDiagonal().assign(0)
29©MapR Technologies 2013- Confidential
Examples
 The trace of a matrix
 Set diagonal to zero
 Set diagonal to negative of row sums excluding the diagonal
m.viewDiagonal().zSum()
m.viewDiagonal().assign(0)
Vector diag = m.viewDiagonal().assign(0);
diag.assign(m.rowSums().assign(Functions.MINUS));
32©MapR Technologies 2013- Confidential
Clustering and Such
 Streaming k-means and ball k-means
– streaming reduces very large data to a cluster sketch
– ball k-means is a high quality k-means implementation
– the cluster sketch is also usable for other applications
– single machine threaded and map-reduce versions available
 SVD and friends
– stochastic SVD has in-memory, single machine out-of-core and map-reduce
versions
– good for reducing very large sparse matrices to tall skinny dense ones
 Spectral clustering
– based on SVD, allows massive dimensional clustering
33©MapR Technologies 2013- Confidential
Mahout Math Summary
 Matrices, Vectors
– views
– in-place assignment
– aggregations
– iterations
 Functions
– lots built-in
– cooperate with sparse vector optimizations
 Sampling
– abstract samplers
– samplers as functions
 Other stuff … clustering, SVD
34©MapR Technologies 2013- Confidential
Part 2:
How recommenders work
(I still want a pony)
35©MapR Technologies 2013- Confidential
Recommendations
Behavior of a
crowd helps us
understand what
individuals will do
36©MapR Technologies 2013- Confidential
Recommendations
Alice got an apple and a
puppy
Charles got a bicycle
Alice
Charles
37©MapR Technologies 2013- Confidential
Recommendations
Alice got an apple and a
puppy
Charles got a bicycle
Bob got an apple
Alice
Bob
Charles
38©MapR Technologies 2013- Confidential
Recommendations
What else would Bob like??
Alice
Bob
Charles
39©MapR Technologies 2013- Confidential
Recommendations
What if everybody gets a
pony?
Now what does Bob want?
?
Alice
Bob
Charles
40©MapR Technologies 2013- Confidential
Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice
41©MapR Technologies 2013- Confidential
Log Files
u1
u3
u2
u1
u3
u2
u1
t1
t2
t3
t4
t3
t3
t1
42©MapR Technologies 2013- Confidential
Log Files and Dimensions
u1
u3
u2
u1
u3
u2
u1
t1
t2
t3
t4
t3
t3
t1
t1
t2
t3
t4
Things
u1 Alice
Bob
Charles
u3
u2
Users
43©MapR Technologies 2013- Confidential
History Matrix
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔
44©MapR Technologies 2013- Confidential
Cooccurrence Matrix
1 2
1 1
1
1
2 1
45©MapR Technologies 2013- Confidential
Indicator Matrix
✔
46©MapR Technologies 2013- Confidential
Indicator Matrix
✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators: (t1)
47©MapR Technologies 2013- Confidential
Problems with Raw Cooccurrence
 Very popular items co-occur with everything
– Welcome document
– Elevator music
 That isn’t interesting
– We want anomalous cooccurrence
48©MapR Technologies 2013- Confidential
Recommendation Basics
 Coocurrence
t3 not t3
t1 2 1
not t1 1 1
49©MapR Technologies 2013- Confidential
Spot the Anomaly
 Root LLR is roughly like standard deviations
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.44 0.98
2.26 7.15
50©MapR Technologies 2013- Confidential
A Quick Simplification
 Users who do h (a vector of things a user has done)
 Also do r
Ah
AT
Ah( )
AT
A( )h
User-centric recommendations
(transpose translates back to things)
Item-centric recommendations
(change the order of operations)
A translates things into users
51©MapR Technologies 2013- Confidential
Symmetry Gives Cross Recommentations
AT
A( )h
BT
A( )h
Conventional recommendations
with off-line learning
Cross recommendations
52©MapR Technologies 2013- Confidential
For example
 Users enter queries (A)
– (actor = user, item=query)
 Users view videos (B)
– (actor = user, item=video)
 ATA gives query recommendation
– “did you mean to ask for”
 BTB gives video recommendation
– “you might like these videos”
53©MapR Technologies 2013- Confidential
The punch-line
 BTA recommends videos in response to a query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)
54©MapR Technologies 2013- Confidential
Real-life example
 Query: “Paco de Lucia”
 Conventional meta-data search results:
– “hombres del paco” times 400
– not much else
 Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff
55©MapR Technologies 2013- Confidential
Real-life example
56©MapR Technologies 2013- Confidential
Hypothetical Example
 Want a navigational ontology?
 Just put labels on a web page with traffic
– This gives A = users x label clicks
 Remember viewing history
– This gives B = users x items
 Cross recommend
– B’A = label to item mapping
 After several users click, results are whatever users think they
should be
57©MapR Technologies 2013- Confidential
Nice. But we
can do better?
58©MapR Technologies 2013- Confidential
Ausers
things
59©MapR Technologies 2013- Confidential
A1 A2
é
ë
ù
û
users
thing
type 1
thing
type 2
60©MapR Technologies 2013- Confidential
A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
A1 A2
é
ë
ù
û
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
r1
r2
é
ë
ê
ê
ù
û
ú
ú
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
r1 = A1
T
A1 A1
T
A2
é
ëê
ù
ûú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
61©MapR Technologies 2013- Confidential
Part 3:
What about that worked
example?
62©MapR Technologies 2013- Confidential
Metrics and
logs (5)
Cooccurrence
analysis (7)
Post to
search
engine (8)
Search
engine (4)
Presentation
tier (2)
User behavior
generator (1)
Session
collector
(3)
History collector
(6)
Diagnostic
browsing (9)
http://bit.ly/18vbbaT
63©MapR Technologies 2013- Confidential
SolR
Indexer
SolR
Indexer
Solr
indexing
Cooccurrence
(Mahout)
Item meta-
data
Index
shards
Complete
history
Analyze with Map-Reduce
64©MapR Technologies 2013- Confidential
SolR
Indexer
SolR
Indexer
Solr
search
Web tier
Item meta-
data
Index
shards
User
history
Deploy with Conventional Search System
65©MapR Technologies 2013- Confidential
Objective Results
 At a very large credit card company
 History is all transactions
 Development time to minimal viable product about 4 months
 General release 2-3 months later
 Search-based recs at or equal in quality to other techniques
66©MapR Technologies 2013- Confidential
Summary
 Input: Multiple kinds of behavior on one set of things
 Output: Recommendations for one kind of behavior with a
different set of things
 Cross recommendation is a special case
67©MapR Technologies 2013- Confidential
Objective Results
 At a very large credit card company
 History is all transactions
 Development time to minimal viable product about 4 months
 General release 2-3 months later
 Search-based recs at or equal in quality to other techniques
68©MapR Technologies 2013- Confidential
Me, Us
 Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
tdunning@{apache.org,maprtech.com} ted.dunning@gmail.com
 MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
 Tonight
Hash tag - #dfwbd #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR

Contenu connexe

Tendances

Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemReal time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemChris Huang
 

Tendances (20)

Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemReal time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystem
 

En vedette

Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyKris Jack
 
Big Data Mining with Mahout
Big Data Mining with MahoutBig Data Mining with Mahout
Big Data Mining with MahoutSolidQ
 
Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDCDrew Farris
 
Recommendation systems with Mahout: introduction
Recommendation systems with Mahout: introductionRecommendation systems with Mahout: introduction
Recommendation systems with Mahout: introductionAdam Warski
 
Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101John Ternent
 
Machine Learning with Mahout
Machine Learning with MahoutMachine Learning with Mahout
Machine Learning with Mahoutbigdatasyd
 
Apache Mahout Algorithms
Apache Mahout AlgorithmsApache Mahout Algorithms
Apache Mahout Algorithmsmozgkarakaya
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Cloudera, Inc.
 
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)Jee Vang, Ph.D.
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Cataldo Musto
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionVarad Meru
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...SlideShare
 

En vedette (20)

Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
 
Big Data Mining with Mahout
Big Data Mining with MahoutBig Data Mining with Mahout
Big Data Mining with Mahout
 
Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDC
 
Recommendation systems with Mahout: introduction
Recommendation systems with Mahout: introductionRecommendation systems with Mahout: introduction
Recommendation systems with Mahout: introduction
 
Mahout part1
Mahout part1Mahout part1
Mahout part1
 
Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101
 
Apache mahout
Apache mahoutApache mahout
Apache mahout
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
mahout introduction
mahout  introductionmahout  introduction
mahout introduction
 
Machine Learning with Mahout
Machine Learning with MahoutMachine Learning with Mahout
Machine Learning with Mahout
 
Mahout classifier tour
Mahout classifier tourMahout classifier tour
Mahout classifier tour
 
Apache Mahout Algorithms
Apache Mahout AlgorithmsApache Mahout Algorithms
Apache Mahout Algorithms
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
 

Similaire à DFW Big Data talk on Mahout Recommenders

Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutTed Dunning
 
What's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache MahoutWhat's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache MahoutMapR Technologies
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to MahoutTed Dunning
 
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGIntroduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGMapR Technologies
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsMapR Technologies
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationTed Dunning
 
Recommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on SymmetryRecommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on SymmetryMapR Technologies
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceMapR Technologies
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in businessMapR Technologies
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San DiegoMapR Technologies
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012MapR Technologies
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time TogetherMapR Technologies
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
 
Search as recommendation
Search as recommendationSearch as recommendation
Search as recommendationTed Dunning
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 

Similaire à DFW Big Data talk on Mahout Recommenders (20)

Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
What's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache MahoutWhat's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache Mahout
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to Mahout
 
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGIntroduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUG
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
 
Recommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on SymmetryRecommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on Symmetry
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in business
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San Diego
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time Together
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
News From Mahout
News From MahoutNews From Mahout
News From Mahout
 
New directions for mahout
New directions for mahoutNew directions for mahout
New directions for mahout
 
Search as recommendation
Search as recommendationSearch as recommendation
Search as recommendation
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 

Plus de Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 

Plus de Ted Dunning (11)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 

Dernier

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 

Dernier (20)

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 

DFW Big Data talk on Mahout Recommenders

  • 1. 1©MapR Technologies 2013- Confidential Introduction to Mahout And How To Build a Recommender
  • 2. 2©MapR Technologies 2013- Confidential Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Tonight Hash tag - #dfwbd #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR
  • 3. 3©MapR Technologies 2013- Confidential Requested Topic For Tonight  What is Mahout?  What makes it different?  How can big data technology solve impossible problems?  How is big data affecting the world?
  • 4. 4©MapR Technologies 2013- Confidential Also  What is MapR?  What is MapR doing?  How does MapR’s technology work?  How are customers making use of MapR?  How can anyone make use of MapR to solve problems?
  • 5. 5©MapR Technologies 2013- Confidential Oh … Also This  Detailed break-down of a live machine learning system running with Mahout on MapR  With code examples
  • 6. 6©MapR Technologies 2013- Confidential I may have to summarize
  • 7. 7©MapR Technologies 2013- Confidential I may have to summarize just a bit
  • 8. 8©MapR Technologies 2013- Confidential Part 1: 5 minutes of math
  • 9. 9©MapR Technologies 2013- Confidential Part 2: 12 minutes: I want a pony
  • 10. 10©MapR Technologies 2013- Confidential Part 3: A working example
  • 11. 11©MapR Technologies 2013- Confidential What Does Machine Learning Look Like?
  • 12. 12©MapR Technologies 2013- Confidential What Does Machine Learning Look Like? A1 A2 é ë ù û T A1 A2 é ë ù û= A1 T A2 T é ë ê ê ù û ú ú A1 A2 é ë ù û = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú r1 r2 é ë ê ê ù û ú ú = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú h1 h2 é ë ê ê ù û ú ú r1 = A1 T A1 A1 T A2 é ëê ù ûú h1 h2 é ë ê ê ù û ú ú O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high quality O(κ d log k) or O(d log κ log k) for larger k, looser quality But tonight we’re going to show you how to keep it simple yet powerful…
  • 13. 13©MapR Technologies 2013- Confidential Comparison of Three Main ML Topics  Recommendation: – Involves observation of interactions between people taking action (users) and items for input data to the recommender model – Goal is to suggest additional appropriate or desirable interactions – Applications include: movie, music or map-based restaurant choices; suggesting sale items for e-stores or via cash-register receipts
  • 16. 16©MapR Technologies 2013- Confidential Part 1: A bit of math (the math of bits)
  • 17. 17©MapR Technologies 2013- Confidential Mahout Math  Goals are – basic linear algebra, – and statistical sampling, – and good clustering, – decent speed, – extensibility, – especially for sparse data  But not – totally badass speed – comprehensive set of algorithms – optimization, root finders, quadrature
  • 18. 18©MapR Technologies 2013- Confidential Matrices and Vectors  At the core: – DenseVector, RandomAccessSparseVector – DenseMatrix, SparseRowMatrix  Highly composable API  Important ideas: – view*, assign and aggregate – iteration m.viewDiagonal().assign(v)
  • 19. 19©MapR Technologies 2013- Confidential Assign? View?  Why assign? – Copying is the major cost for naïve matrix packages – In-place operations critical to reasonable performance – Many kinds of updates required, so functional style very helpful  Why view? – In-place operations often required for blocks, rows, columns or diagonals – With views, we need #assign + #views methods – Without views, we need #assign x #views methods  Synergies – With both views and assign, many loops become single line
  • 20. 24©MapR Technologies 2013- Confidential Examples A =a A =aB+ b double alpha; a.assign(alpha); a.assign(b, Functions.chain( Functions.plus(beta), Functions.times(alpha));
  • 21. 26©MapR Technologies 2013- Confidential More Examples  The trace of a matrix  Set diagonal to zero  Set diagonal to negative of row sums
  • 22. 27©MapR Technologies 2013- Confidential Examples  The trace of a matrix  Set diagonal to zero  Set diagonal to negative of row sums m.viewDiagonal().zSum()
  • 23. 28©MapR Technologies 2013- Confidential Examples  The trace of a matrix  Set diagonal to zero  Set diagonal to negative of row sums m.viewDiagonal().zSum() m.viewDiagonal().assign(0)
  • 24. 29©MapR Technologies 2013- Confidential Examples  The trace of a matrix  Set diagonal to zero  Set diagonal to negative of row sums excluding the diagonal m.viewDiagonal().zSum() m.viewDiagonal().assign(0) Vector diag = m.viewDiagonal().assign(0); diag.assign(m.rowSums().assign(Functions.MINUS));
  • 25. 32©MapR Technologies 2013- Confidential Clustering and Such  Streaming k-means and ball k-means – streaming reduces very large data to a cluster sketch – ball k-means is a high quality k-means implementation – the cluster sketch is also usable for other applications – single machine threaded and map-reduce versions available  SVD and friends – stochastic SVD has in-memory, single machine out-of-core and map-reduce versions – good for reducing very large sparse matrices to tall skinny dense ones  Spectral clustering – based on SVD, allows massive dimensional clustering
  • 26. 33©MapR Technologies 2013- Confidential Mahout Math Summary  Matrices, Vectors – views – in-place assignment – aggregations – iterations  Functions – lots built-in – cooperate with sparse vector optimizations  Sampling – abstract samplers – samplers as functions  Other stuff … clustering, SVD
  • 27. 34©MapR Technologies 2013- Confidential Part 2: How recommenders work (I still want a pony)
  • 28. 35©MapR Technologies 2013- Confidential Recommendations Behavior of a crowd helps us understand what individuals will do
  • 29. 36©MapR Technologies 2013- Confidential Recommendations Alice got an apple and a puppy Charles got a bicycle Alice Charles
  • 30. 37©MapR Technologies 2013- Confidential Recommendations Alice got an apple and a puppy Charles got a bicycle Bob got an apple Alice Bob Charles
  • 31. 38©MapR Technologies 2013- Confidential Recommendations What else would Bob like?? Alice Bob Charles
  • 32. 39©MapR Technologies 2013- Confidential Recommendations What if everybody gets a pony? Now what does Bob want? ? Alice Bob Charles
  • 33. 40©MapR Technologies 2013- Confidential Log Files Alice Bob Charles Alice Bob Charles Alice
  • 34. 41©MapR Technologies 2013- Confidential Log Files u1 u3 u2 u1 u3 u2 u1 t1 t2 t3 t4 t3 t3 t1
  • 35. 42©MapR Technologies 2013- Confidential Log Files and Dimensions u1 u3 u2 u1 u3 u2 u1 t1 t2 t3 t4 t3 t3 t1 t1 t2 t3 t4 Things u1 Alice Bob Charles u3 u2 Users
  • 36. 43©MapR Technologies 2013- Confidential History Matrix Alice Bob Charles ✔ ✔ ✔ ✔ ✔ ✔ ✔
  • 37. 44©MapR Technologies 2013- Confidential Cooccurrence Matrix 1 2 1 1 1 1 2 1
  • 38. 45©MapR Technologies 2013- Confidential Indicator Matrix ✔
  • 39. 46©MapR Technologies 2013- Confidential Indicator Matrix ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1)
  • 40. 47©MapR Technologies 2013- Confidential Problems with Raw Cooccurrence  Very popular items co-occur with everything – Welcome document – Elevator music  That isn’t interesting – We want anomalous cooccurrence
  • 41. 48©MapR Technologies 2013- Confidential Recommendation Basics  Coocurrence t3 not t3 t1 2 1 not t1 1 1
  • 42. 49©MapR Technologies 2013- Confidential Spot the Anomaly  Root LLR is roughly like standard deviations A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 2 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 0.44 0.98 2.26 7.15
  • 43. 50©MapR Technologies 2013- Confidential A Quick Simplification  Users who do h (a vector of things a user has done)  Also do r Ah AT Ah( ) AT A( )h User-centric recommendations (transpose translates back to things) Item-centric recommendations (change the order of operations) A translates things into users
  • 44. 51©MapR Technologies 2013- Confidential Symmetry Gives Cross Recommentations AT A( )h BT A( )h Conventional recommendations with off-line learning Cross recommendations
  • 45. 52©MapR Technologies 2013- Confidential For example  Users enter queries (A) – (actor = user, item=query)  Users view videos (B) – (actor = user, item=video)  ATA gives query recommendation – “did you mean to ask for”  BTB gives video recommendation – “you might like these videos”
  • 46. 53©MapR Technologies 2013- Confidential The punch-line  BTA recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
  • 47. 54©MapR Technologies 2013- Confidential Real-life example  Query: “Paco de Lucia”  Conventional meta-data search results: – “hombres del paco” times 400 – not much else  Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
  • 48. 55©MapR Technologies 2013- Confidential Real-life example
  • 49. 56©MapR Technologies 2013- Confidential Hypothetical Example  Want a navigational ontology?  Just put labels on a web page with traffic – This gives A = users x label clicks  Remember viewing history – This gives B = users x items  Cross recommend – B’A = label to item mapping  After several users click, results are whatever users think they should be
  • 50. 57©MapR Technologies 2013- Confidential Nice. But we can do better?
  • 51. 58©MapR Technologies 2013- Confidential Ausers things
  • 52. 59©MapR Technologies 2013- Confidential A1 A2 é ë ù û users thing type 1 thing type 2
  • 53. 60©MapR Technologies 2013- Confidential A1 A2 é ë ù û T A1 A2 é ë ù û= A1 T A2 T é ë ê ê ù û ú ú A1 A2 é ë ù û = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú r1 r2 é ë ê ê ù û ú ú = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú h1 h2 é ë ê ê ù û ú ú r1 = A1 T A1 A1 T A2 é ëê ù ûú h1 h2 é ë ê ê ù û ú ú
  • 54. 61©MapR Technologies 2013- Confidential Part 3: What about that worked example?
  • 55. 62©MapR Technologies 2013- Confidential Metrics and logs (5) Cooccurrence analysis (7) Post to search engine (8) Search engine (4) Presentation tier (2) User behavior generator (1) Session collector (3) History collector (6) Diagnostic browsing (9) http://bit.ly/18vbbaT
  • 56. 63©MapR Technologies 2013- Confidential SolR Indexer SolR Indexer Solr indexing Cooccurrence (Mahout) Item meta- data Index shards Complete history Analyze with Map-Reduce
  • 57. 64©MapR Technologies 2013- Confidential SolR Indexer SolR Indexer Solr search Web tier Item meta- data Index shards User history Deploy with Conventional Search System
  • 58. 65©MapR Technologies 2013- Confidential Objective Results  At a very large credit card company  History is all transactions  Development time to minimal viable product about 4 months  General release 2-3 months later  Search-based recs at or equal in quality to other techniques
  • 59. 66©MapR Technologies 2013- Confidential Summary  Input: Multiple kinds of behavior on one set of things  Output: Recommendations for one kind of behavior with a different set of things  Cross recommendation is a special case
  • 60. 67©MapR Technologies 2013- Confidential Objective Results  At a very large credit card company  History is all transactions  Development time to minimal viable product about 4 months  General release 2-3 months later  Search-based recs at or equal in quality to other techniques
  • 61. 68©MapR Technologies 2013- Confidential Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG tdunning@{apache.org,maprtech.com} ted.dunning@gmail.com  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Tonight Hash tag - #dfwbd #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR

Notes de l'éditeur

  1. Note to speaker: Move quickly through 1st two slides just to set the tone of familiar use cases but somewhat complicated under-the-covers math and algorithms… You don’t need to explain or discuss these examples at this point… just mention one or twoTalk track: Machine learning shows up in many familiar everyday examples, from product recommendations to listing news topics to filtering out that nasty spam from email….
  2. Talk track: Under the covers, machine learning looks very complicated. So how do you get from here to the familiar examples? Tonight’s presentation will show you some simple tricks to help you apply machine learning techniques to build a powerful recommendation engine.