How to tell which algorithms really matter

How to Tell Which Algorithms Really
Matter
Ted Dunning
MapR Technologies

© 2014 MapR Technologies 3
00:011.65TB
WITH 298 SERVERS

129K
RECCOMENDATIONS
00:02

Advertising
Automation
Cloud
Sellers
Cloud
Buyers
Cloud
63M
AD AUCTIONS
00:03

00:04422.2K
GENETIC SEQUENCES

Largest Biometric
Database
00:054.73M
AUTHENTICATIONS

© 2014 MapR Technologies 8© 2014 MapR Technologies
But How is This Done?
What really matters?

Topic For Today
• What is important? What is not?
• Why?
• What is the difference from academic research?
• Some examples

What is Important?
• Deployable
• Robust
• Transparent
• Skillset and mindset matched?
• Proportionate

What is Important?
• Deployable
– Clever prototypes don’t count if they can’t be standardized
• Robust
• Transparent
• Proportionate

What is Important?
• Deployable
– Clever prototypes don’t count
• Robust
– Mishandling is common
• Transparent
– Will degradation be obvious?
• Proportionate

What is Important?
• Deployable
– Clever prototypes don’t count
• Robust
– Mishandling is common
• Transparent
– Will degradation be obvious?
– How long will your fancy data scientist enjoy doing standard ops tasks?
• Proportionate
– Where is the highest value per minute of effort?

Academic Goals vs Pragmatics
• Academic goals
– Reproducible
– Isolate theoretically important aspects
– Work on novel problems
• Pragmatics
– Highest net value
– Available data is constantly changing
– Diligence and consistency have larger impact than cleverness
– Many systems feed themselves, exploration and exploitation are both
important
– Engineering constraints on budget and schedule

Example 1:
Making Recommendations Better

Recommendation Advances
• What are the most important algorithmic advances in
recommendations over the last 10 years?
• Cooccurrence analysis?
• Matrix completion via factorization?
• Latent factor log-linear models?
• Temporal dynamics?

The Winner – None of the Above
• What are the most important algorithmic advances in
recommendations over the last 10 years?
1. Result dithering (random noise)
2. Anti-flood (don’t repeat yourself)

The Real Issues
• Exploration
• Diversity
• Speed
• Not the last fraction of a percent

Result Dithering
• Dithering is used to re-order recommendation results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual
performance much better

Result Dithering
• Dithering is used to re-order recommendation results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual
performance much better
“Made more difference than any other change”

Example … ε = 0.5
1 2 6 5 3 4 13 16
1 2 3 8 5 7 6 34
1 4 3 2 6 7 11 10
1 2 4 3 15 7 13 19
1 6 2 3 4 16 9 5
1 2 3 5 24 7 17 13
1 2 3 4 6 12 5 14
2 1 3 5 7 6 4 17
4 1 2 7 3 9 8 5
2 1 5 3 4 7 13 6
3 1 5 4 2 7 8 6
2 1 3 4 7 12 17 16

Example … ε = log 2 = 0.69
1 2 8 3 9 15 7 6
1 8 14 15 3 2 22 10
1 3 8 2 10 5 7 4
1 2 10 7 3 8 6 14
1 5 33 15 2 9 11 29
1 2 7 3 5 4 19 6
1 3 5 23 9 7 4 2
2 4 11 8 3 1 44 9
2 3 1 4 6 7 8 33
3 4 1 2 10 11 15 14
11 1 2 4 5 7 3 14
1 8 7 3 22 11 2 33

Exploring The Second Page

Lesson 1:
Exploration is good

Example 2:
Bayesian Bandits

Bayesian Bandits
• Based on Thompson sampling
• Very general sequential test
• Near optimal regret
• Trade-off exploration and exploitation
• Possibly best known solution for exploration/exploitation
• Incredibly simple

Fast Convergence
11000 100 200 300 400 500 600 700 800 900 1000
0.12
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
n
regret
ε- greedy, ε = 0.05
Bayesian Bandit with Gamma- Normal

Thompson Sampling on Ads
An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011

Bayesian Bandits versus Result Dithering
• Many useful systems are difficult to frame in fully Bayesian form
• Thompson sampling cannot be applied without posterior
sampling
• Can still do useful exploration with dithering
• But better to use Thompson sampling if possible

Lesson 2:
Exploration is easy to do and
pays big benefits.

Example 3:
On-line Clustering

The Problem
• K-means clustering is useful for feature extraction or
compression
• At scale and at high dimension, the desirable number of clusters
increases
• Very large number of clusters may require more passes through
the data
• Super-linear scaling is generally infeasible

The Solution
• Sketch-based algorithms produce a sketch of the data
• Streaming k-means uses adaptive dp-means to produce this
sketch in the form of many weighted centroids which
approximate the original distribution
• The size of the sketch grows very slowly with increasing data
size
• Many operations such as clustering are well behaved on
sketches
Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson.
Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan.

An Example

Streaming k-means Ideas
• By using a sketch with lots (k log N) of centroids, we avoid
pathological cases
• We still get a very good result if the sketch is created
– in one pass
– with approximate search
• In fact, adaptive dp-means works just fine
• In the end, the sketch can be used for clustering or …

Lesson 3:
Sketches make big data small.

Example 4:
Search Abuse

Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Alice
Charles

Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Bob got an apple
Alice
Bob
Charles

Recommendations
What else would Bob like??
Alice
Bob
Charles

Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice

History Matrix: Users by Items
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔

Co-occurrence Matrix: Items by Items
-
1 2
1 1
1
1
2 1
How do you tell which co-occurrences are useful?.
0
0
0 0

Indicator Matrix: Anomalous Co-Occurrence
✔
✔
Result: The marked row will be added to the indicator field in the
item document…

Indicator Matrix
✔
id: t4
title: puppy
desc: The sweetest little puppy
ever.
keywords: puppy, dog, pet
indicators: (t1)
That one row from indicator matrix becomes the indicator field in the
Solr document used to deploy the recommendation engine.
Note: data for the indicator field is added directly to meta-data for a document in Solr
index. You don’t need to create a separate index for the indicators.

Internals of the Recommender Engine
56

Real-life example

Lesson 4:
Recursive search abuse pays
Search can implement recs
Which can implement search

How Does This Apply?

How Can I Start?

Q&A
@ted_dunning @mapr maprtech
tdunning@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

How to tell which algorithms really matter

How to tell which algorithms really matter

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

En vedette

En vedette (8)

Similaire à How to tell which algorithms really matter

Similaire à How to tell which algorithms really matter (20)

Plus de DataWorks Summit

Plus de DataWorks Summit (20)

Dernier

Dernier (20)

How to tell which algorithms really matter

Notes de l'éditeur