Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Search Engines for 
Machine Learning 
Joseph Blue, Data Scientist, MapR 
jblue@mapr.com
ROADMAP 
The 
Deployment 
Challenge 
(WANT) 
All 
About 
Recom-­‐ 
menders 
(BUILD) 
Search 
Engine 
Delivers 
Results 
(D...
Recommendations 
• Data: 
interacKons 
between 
people 
taking 
acKon 
(users) 
and 
items 
• Used 
to 
train 
recommendaK...
Spend your Cycles Wisely 
D 
A 
T 
A 
D 
E 
V 
E 
L 
O 
P 
D 
E 
P 
L 
O 
Y 
Time 
W 
A 
N 
T 
B 
U 
I 
L 
D 
D 
E 
P 
L 
...
Of bikes and ponies 
? 
Alice 
Bob 
Amelia 
Charles 
What 
if 
everybody 
gets 
a 
pony? 
What 
else 
would 
you 
recommen...
Three Matrices 
But 
we 
need 
a 
method 
for 
iden@fying 
anomalous 
co-­‐ 
occurrence… 
✔ 
✔ 
1 
2 
0 
1 
1 
1 
1 
1 
0 ...
Log Likelihood Two Ways 
U 
S 
E 
R 
S 
• Size 
= 
# 
users 
interact 
with 
that 
item 
• Overlap 
= 
# 
users 
who 
have...
Updating the metadata = deployment 
✔ 
✔ 
id: 
t4 
Ktle: 
puppy 
desc: 
The 
sweetest 
liZle 
puppy 
ever. 
keywords: 
pup...
Example Workflow 
Log 
Files 
New 
User 
History 
Mahout 
Analysis 
S 
O 
L 
R 
C 
O 
L 
L 
E 
C 
T 
I 
O 
N 
Item 
Meta-­...
But we can do better… 
W 
A 
N 
T 
B 
U 
I 
L 
D 
D 
E 
P 
L 
O 
Y 
I 
M 
P 
R 
O 
V 
E 
id: 
t4 
Ktle: 
puppy 
desc: 
The...
Knowing your Data moves the Needle 
W 
A 
N 
T 
✔ 
✔ 
✔ 
✔ 
B 
U 
I 
L 
D 
D 
E 
P 
L 
O 
Y 
2 
I 
M 
P 
R 
O 
V 
E 
✔ 
✔ ...
More information is available… 
hZps://www.mapr.com/products/mapr-­‐sandbox-­‐hadoop 
hZps://www.mapr.com/resources/white-...
Search Engines for Machine Learning: Presented by Joe Blue, MapR
Prochain SlideShare
Chargement dans…5
×
Prochain SlideShare
Deployment with ExpressionEngine
Suivant
Télécharger pour lire hors ligne et voir en mode plein écran

0

Partager

Télécharger pour lire hors ligne

Search Engines for Machine Learning: Presented by Joe Blue, MapR

Télécharger pour lire hors ligne

Presented at Lucene/Solr Revolution 2014

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir
  • Soyez le premier à aimer ceci

Search Engines for Machine Learning: Presented by Joe Blue, MapR

  1. 1. Search Engines for Machine Learning Joseph Blue, Data Scientist, MapR jblue@mapr.com
  2. 2. ROADMAP The Deployment Challenge (WANT) All About Recom-­‐ menders (BUILD) Search Engine Delivers Results (DEPLOY) Improving Those Results (IMPROVE)
  3. 3. Recommendations • Data: interacKons between people taking acKon (users) and items • Used to train recommendaKon model • Goal is to suggest addiKonal interacKons • Example applicaKons: movie, music or map-­‐based restaurant choices; suggesKng sale items for e-­‐stores or via cash-­‐register receipts W A N T B U I L D D E P L O Y I M P R O V E
  4. 4. Spend your Cycles Wisely D A T A D E V E L O P D E P L O Y Time W A N T B U I L D D E P L O Y I M P R O V E D A T A D & D Take more Kme to understand your data and deploy a good recommender quickly
  5. 5. Of bikes and ponies ? Alice Bob Amelia Charles What if everybody gets a pony? What else would you recommend for new user Amelia? W A N T B U I L D D E P L O Y I M P R O V E
  6. 6. Three Matrices But we need a method for iden@fying anomalous co-­‐ occurrence… ✔ ✔ 1 2 0 1 1 1 1 1 0 0 0 2 Alice Bob Charles ✔ ✔ ✔ ✔ ✔ ✔ ✔ User-­‐item interacKon Item Co-­‐occurrence Indicators W A N T B U I L D D E P L O Y I M P R O V E
  7. 7. Log Likelihood Two Ways U S E R S • Size = # users interact with that item • Overlap = # users who have two items in common • LL = f ( size & overlap & number of users) W A N T B U I L D D E P L O Y I M P R O V E Items will be shared by users, but how much is too much? 10 not not 10,000 0 0 13 2Σ 2Σ not 14.3 not 100,000 1,000 1,000 0.90 LL = 2 * yij log( yij μij ) j=1 i=1
  8. 8. Updating the metadata = deployment ✔ ✔ id: t4 Ktle: puppy desc: The sweetest liZle puppy ever. keywords: puppy, dog, pet indicators: (t1) Indicator Solr document for “puppy” W A N T B U I L D D E P L O Y Note: data for the indicator field is added directly to meta-­‐data for a document in Apache Solr collec9on. You don’t need to create a separate index for the indicators. I M P R O V E Complete indicator matrix from log-­‐likelihood…
  9. 9. Example Workflow Log Files New User History Mahout Analysis S O L R C O L L E C T I O N Item Meta-­‐Data Ingest easily via NFS via NFS MapR Cluster Use Python directly via NFS Python Pig Web Tier RecommendaKons W A N T B U I L D D E P L O Y I M P R O V E O N – L I N E O F F L I N E 1 2 3
  10. 10. But we can do better… W A N T B U I L D D E P L O Y I M P R O V E id: t4 Ktle: puppy desc: The sweetest liZle puppy ever. keywords: puppy, dog, pet indicators: (t1) The indicated items are returned when we query the collecKon based on user history, but not all user behaviors are created equal. Items with opposite polarity may turn your recommendaKons into a spam generator. Example: consider the difference in future purchases afer viewing or purchasing razor blades vs. Blu-­‐ray movie…
  11. 11. Knowing your Data moves the Needle W A N T ✔ ✔ ✔ ✔ B U I L D D E P L O Y 2 I M P R O V E ✔ ✔ ✔ ✔ ✔ ✔ ✔ 1 2 0 1 1 1 1 1 0 0 0 ✔ ✔ ✔ ✔ id: t4 Ktle: puppy desc: The sweetest liZle puppy ever. keywords: puppy, dog, pet purchase indicators: (t1) click indicators: (t2) 0 0 1 0 1 0 1 0 1 0 1 1 ✔ ✔ clicks purchases
  12. 12. More information is available… hZps://www.mapr.com/products/mapr-­‐sandbox-­‐hadoop hZps://www.mapr.com/resources/white-­‐papers#e-­‐books

Presented at Lucene/Solr Revolution 2014

Vues

Nombre de vues

2 129

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

921

Actions

Téléchargements

32

Partages

0

Commentaires

0

Mentions J'aime

0

×