This session will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system is built using Mahout to do off-line analysis and Solr to provide real-time recommendations. The presentation will also include enough theory to provide useful working intuitions for those desiring to adapt this design.
The entire system including a data generator, off-line analysis scripts, Solr configurations and sample web pages will be made available on github for attendees to modify as they like.
4. What does Machine Learning look like?
! T #
! A A # ! A A # = % A1 &! A A #
2 $ "
1
2 $
2 $
" 1
% AT &" 1
" 2 $
! T
#
T
A1 A1 A1 A 2 &
=% T
% A 2 A1 AT A 2 &
2
"
$
O(κ
k
d
+
k3
d)
=
O(k2
d
log
!
+
k3
d)
for
T A k,
T A #!
n r # ! A small
A
1
2 &
% 1 &=% 1 1
%
high
quality
T
T
&"
% r2 & % A 2 k 1
"
$ "
O(κ
d
log
k)
or
O(d
log
κ
log
k)
for
larger
A,
A 2 A 2 $%
looser
quality
!
! T
#%
T
r1 = % A1 A1 A1 A 2 &
"
$%
"
T
h1 #
&
h2 &
$
h1 #
&
h2 &
$
5. Recommendations as Machine Learning
•
Observation of interactions between users taking
actions and items for input data to recommender
model
•
Goal: suggest additional appropriate or desirable
interactions
•
Example applications:
– similar movie, music, books (topic, style, etc.)
– map-based restaurant choices
– suggesting sale items for e-stores or cash-register
receipts
14. Problems with Raw Co-occurrence
•
•
•
Very popular items co-occur with everything
– Examples: welcome document; elevator music
Very widespread occurrence is not interesting as a way to generate
indicators
– Unless you want to offer an item that is constantly desired, such as razor
blades
What we want is anomalous co-occurrence
– This is the source of interesting indicators of preference on which to base
recommendation
15. Get Useful Indicators from Behaviors
1.
Use log files to build history matrix of users x items
– Remember: this history of interactions will be sparse compared to all potential
combinations
2.
Transform to a co-occurrence matrix of items x items
3.
Look for useful co-occurrence by looking for anomalous co-occurrences to
make an indicator matrix
– Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with
confidence be used as indicators of preference
– RowSimilarityJob in Apache Mahout uses LLR
20. Co-occurrence Matrix: Items by Items
How
do
you
tell
which
co-‐occurrences
are
useful?
1
1
2
1
0
0
-‐
2
1
0
0
1
1
Use
LLR
test
to
turn
co-‐occurrence
into
indicators…
22. Spot the Anomaly
A
not
A
B
13
1000
not
B
1000
100,000
A
not
A
B
1
0
not
B
0
10,000
A
not
A
B
1
0
not
B
0
2
A
not
A
B
10
0
not
B
0
100,000
What
conclusion
do
you
draw
from
each
situa9on?
23. Spot the Anomaly
A
not
A
B
13
1000
not
B
1000
100,000
A
not
A
B
1
0
not
B
0
10,000
0.90
4.52
•
•
A
not
A
B
1
0
not
B
0
2
A
not
A
B
10
0
not
B
0
100,000
1.95
14.3
Root LLR is roughly like standard deviations
In Apache Mahout, RowSimilarityJob uses
LLR
24. Indicator Matrix: Anomalous Co-cccurrence
Result:
The
marked
row
will
be
added
to
the
indicator
field
in
the
item
document
…
✔
✔
Significant
co-‐occurrences!
indicators
25. Indicator Matrix
✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators:
(t1)
That
one
row
from
indicator
matrix
becomes
the
indicator
field
in
the
Solr
document
used
to
deploy
the
recommenda@on
engine
Note:
data
for
the
indicator
field
is
added
directly
to
meta
data
for
a
document
in
Solr
index.
You
don’t
need
to
create
a
separate
index
for
the
indicators.
32. Search-based recommendation
•
Sample Document
– Merchant Id original
data
– Field for text description
and
meta-‐data
– Phone
– Address
– Location
–
–
–
–
–
•
Sample Query
– Current location
– Recent merchant descriptions
– Recent merchant id’s
– Recent SIC codes
– Recent accepted offers
– Local Top40
Indicator merchant id’s
recommendaRon
query
Indicator industry (SIC) id’s
Indicator offers
Indicator text
derived
from
co-‐occurrence
analysis
Local Top40
33. Analyze with MapReduce
complete
history
Co-‐occurrence
(Mahout)
Item
meta-‐data
SolR
SolR
Solr
Indexer
Indexer
indexing
Index
shards
34. Deploy with Conventional Search System
user
history
Web
Rer
Item
meta-‐data
SolR
SolR
Solr
Indexer
Indexer
search
Index
shards
35. Outro
•
Kudos to Ted Dunning, Grant Ingersoll and LucidWorks,
for the idea & the demo!
•
Get in touch: Twitter—@mhausenblas, @MapR
•
Ah, and, btw: we’re hiring ;)