SlideShare a Scribd company logo
1 of 42
Download to read offline
How Does Google? !
!
	

David F. Gleich!
Computer Science!
Purdue University!

A journey into the wondrous mathematics
behind your favorite websites
1
Mathematics underlies an
enormous number of the
websites we use everyday!
2
1.  ‘s PageRank

2.  Multi-armed bandits and
internet experiments
3
4
Larry Page !
Sergey Brin!

•  Created a web-search algorithm
called “backrub”
•  Spun-off a company “Googol”
based on the paper

•  The importance of a page is
determined by the importance of
pages that link to it.
Lawrence Page, Sergey Brin, Rajeev Motwani,Terry
Winograd “The PageRank Citation Ranking: Bringing
Order to the Web” TR, Stanford InfoLab, 1999	

5
A websearch primer
1.  Crawl webpages
2.  Analyze webpage text (information retrieval)
3.  Analyze webpage links
4.  Fit over 200 measures to human evaluations
5.  Produce rankings
6.  Continuously update
6
Pages, nodes, incoming links,
outgoing links, and “importance”
7
“Important” pages
that link to me!
c
b
a
“Important”
pages that
link to
Purdue!
8
Tim Davis andYifan Hu	

Sparse Matrix Gallery
http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
1000 vertices on
8.5-by-11 paper
1,000,000,000,000
vertices (one trillion)

Paper the size of
Manhattan island !
(23 sq miles)?
The web
10
We need something better!
11
A wee web-graph: link
counting is too easy to game!
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

12
A wee web-graph: link
counting is too easy to game!
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

The importance of a
page is determined
by the importance of
pages that link to it.
x1 = 0
x2 =
1
3
x1
x3 =
1
3
x1 +
1
2
x2
x4 =
1
3
x1 + x3 + x5
x5 = x4
x6 =
1
2
x2
13
The importance of a page is determined
by the importance of pages that link to it
xi =
X
j2Bi
1
dj
xj
“Back-links from page i”
Why it was called Backrub!	

“Importance” of page i
“Importance” of page j
Number of links page j uses!
out-degree in graph theory	

x3 =
1
3
x1 +
1
2
x2
1	

2	

3	

1/3 	

1/2 	

14
We can rewrite this equation in a more
mathematically convenient way
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
4 1 2 3 4 5 6
5 1 2 3 4 5 6
6 1 2 3 4 5 6
x 0 x 0 x 0 x 0 x 0 x 0 x
1
x x 0 x 0 x 0 x 0 x 0 x
3
1 1
x x x 0 x 0 x 0 x 0 x
3 2
1
x x 0 x 1x 0 x 1x 0 x
3
x 0 x 0 x 0 x 1x 0 x 0 x
1
x 0 x x 0 x 0 x 0 x 0 x
2
= + + + + +
= + + + + +
= + + + + +
= + + + + +
= + + + + +
= + + + + +
15
1 1
2 2
3 3
4 4
5 5
6 6
x x0 0 0 0 0 0
x x1/ 3 0 0 0 0 0
x x1/ 3 1/ 2 0 0 0 0
or
x x1/ 3 0 1 0 1 0
x x0 0 0 1 0 0
x x0 1/ 2 0 0 0 0
⎡ ⎤ ⎡ ⎤⎡ ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
=⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
x = Px
And even more conveniently!
Element k in column m = "probability" of
going from node m to node k
16
The matrix P for websites
shows a lot of structure
Every dot is a non-zero element indicating a link
Matrices are sparse, and generally with block structure
block structure can be explored to speed up ranking algorithm
17
But this idea doesn’t work for
the wee web-graph
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

Nodes 1, 4 and 5
determine everything!
x1 = 0
x2 =
1
3
x1
x3 =
1
3
x1 +
1
2
x2
x4 =
1
3
x1 + x3 + x5
x5 = x4
x6 =
1
2
x2
x1 = 0
x2 =
1
3
x1 = 0
x3 =
1
3
x1 +
1
2
x2 = 0
x4 =
1
3
x1 + x3 + x5 = x5
x5 = x4
x6 =
1
2
x2 = 0
18
But this idea doesn’t work for
the wee web-graph
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

Node 1 !
“lonely”

Nodes 4 and 5 !
“mutual admiration
societies” 

Node 6 
“anti-social”
These nodes need to be “fixed” to get a
reliable and useful ranking!
19
The gang of four to the rescue
Andrei
Markov
Oscar
Perron
Georg
Frogenius
Richard !
von Mises
20
Let’s fix it up and force node 6 to
choose, or link to everyone
1
2
3
4
5
6
P =
2
6
6
6
6
6
6
4
0 0 0 0 0 0
1/3 0 0 0 0 0
1/3 1/2 0 0 0 0
1/3 0 1 0 1 0
0 0 0 1 0 0
0 1/2 0 0 0 0
3
7
7
7
7
7
7
5
P =
2
6
6
6
6
6
6
4
0 0 0 0 0 1/6
1/3 0 0 0 0 1/6
1/3 1/2 0 0 0 1/6
1/3 0 1 0 1 1/6
0 0 0 1 0 1/6
0 1/2 0 0 0 1/6
3
7
7
7
7
7
7
5
21
Taxation is the way to
representation!
c
b
a
If is a good page, then
it’ll still be a good page if
we “tax” the importance
from a, b, and c

We can redistribute the
taxed amounts to all
including lonely nodes!
22
The importance of a page is determined
by the importance of pages that link to it*
* After tax and any benefits
The total importance that page j !
contributes to page i
Benefits to page i
The taxation rate of all
xi =
X
j2Bi
↵
xj
dj
+ (1 ↵)bi
23
x1
x2
x3
x4
x5
x6
!
"
#
#
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
&
&
= α
0 0 0 0 0 1/ 6
1/ 3 0 0 0 0 1/ 6
1/ 3 1/ 2 0 0 0 1/ 6
1/ 3 0 1 0 1 1/ 6
0 0 0 1 0 1/ 6
0 1/ 2 0 0 0 1/ 6
!
"
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
x1
x2
x3
x4
x5
x6
!
"
#
#
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
&
&
+(1− α)
b1
b2
b3
b4
b5
b6
!
"
#
#
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
&
&
Perron and Frobenius showed the new
equation always has a unique solution
x = ↵Px + (1 ↵)b
24
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

What von Mises and Richardson showed
is that guess, check, and correct works!
x(new)
= ↵Px(old)
+ (1 ↵)b
x(start)
=
2
6
6
6
6
6
6
4
0.17
0.17
0.17
0.17
0.17
0.17
3
7
7
7
7
7
7
5
x(1)
=
2
6
6
6
6
6
6
4
0.05
0.10
0.17
0.38
0.19
0.12
3
7
7
7
7
7
7
5
x(2)
=
2
6
6
6
6
6
6
4
0.04
0.06
0.10
0.36
0.36
0.08
3
7
7
7
7
7
7
5
x(1)
=
2
6
6
6
6
6
6
4
0.03
0.04
0.06
0.43
0.39
0.05
3
7
7
7
7
7
7
5
25
26
There’s still a lot of work left to
do to make a search engine
Make it fast!
Watch out for spam
Watch out for manipulation
Personalize

Experiment!
27
1.  ‘s PageRank

2.  Multi-armed bandits and
internet experiments
28
http://adamlofting.com/736/drawn-multi-armed-bandit-experiments/multi-armed-bandit/
Not this!
29
http://upload.wikimedia.org/wikipedia/en/8/82/Las_Vegas_slot_machines.jpg
This!
Pays out !
$0.92/
dollar
Pays out !
$0.98/
dollar
Pays out !
$0.95/
dollar
Pays out !
$0.99/
dollar
30
What in the heck does a multi-armed
bandit have to do with Google?
31
What in the heck does a multi-armed
bandit have to do with Google?
Pays out !
$0.92/
view
Pays out !
$0.66/
view
Pays out !
$0.91/
view to
show ads
Pays out !
-$0.02/view
hide ads
32
How to optimize your website
without exploiting the bandits
Try condition A 100 times, find 45 “wins”
Try condition B 100 times, find 85 “wins”
Try condition C 100 times, find 10 “wins”
…
Choose the best!
33
This field has some of the
best terminology

Explore !

Exploit !

Regret
34
This field has some of the
best terminology

Explore – Visiting Las Vegas!

Exploit – Your new winning strategy!

Regret – That you didn’t quit after
winning the first round
35
This field has some of the
best terminology

Explore – Testing slot machines/
experiments for their reward
Exploit – Playing the best reward
you’ve found so far 
Regret – How much you lost due !
to exploration
36
How to optimize your website
without exploiting the bandits
Try condition A 100 times, find 45 “wins”
Try condition B 100 times, find 85 “wins”
Try condition C 100 times, find 10 “wins”
…
Choose the best!
Pure
exploration!
We only exploit our findings at the end!
37
How to optimize your website
exploiting the bandits
Try condition A 5 times, find 4 wins!
Try condition B 5 times, find 4 wins!
Try condition C 5 times, find 2 wins

Try condition A 7 times, find 3 wins!
Try condition B 7 times, find 5 wins!
Try condition C 1 time, find 0 wins


Pure
exploration!
Exploit our
knowledge
Condition
 A
 B
 C
Est. Return
 0.58
 0.75
 0.33
38
The goal of these problems is to construct
optimal strategies to minimize regret
Regret how much you left “on the table” by exploring	

	

	

	

	

zero-regret strategy is one where 

regret(T trials) is sublinear in T!

as the number of plays T → ∞ 	

E[play best always plays made based on data]
regret 100-each 255/300 140/300 = 0.38
regret 30-mixed 25.5/30 0.45 ⇥ 12 + 0.85 ⇥ 12 + 0.1 ⇥ 6 = 0.31
39
[The bandit problem] was formulated during the [second
world] war, and efforts to solve it so sapped the energies
and minds of Allied analysts that the suggestion was
made that the problem be dropped over Germany, as the
ultimate instrument of intellectual sabotage.	

Peter Whittle (Whittle, 1979)
Discussion of “Bandit processes and dynamical allocation indices”
Their importance to website optimization,
advertising, and recommendation has
rejuvenated research on these problems
with fascinating new questions. 
40
Math is everywhere and
especially your favorite
websites!
Matrices and probability are
key ingredients.
41
PageRank on Wikipedia
= 0.50
United States
C:Living people
France
Germany
England
United Kingdom
Canada
Japan
Poland
Australia
= 0.85
United States
C:Main topic classif.
C:Contents
C:Living people
C:Ctgs. by country
United Kingdom
C:Fundamental
C:Ctgs. by topic
C:Wikipedia admin.
France
= 0.99
C:Contents
C:Main topic classif.
C:Fundamental
United States
C:Wikipedia admin.
P:List of portals
P:Contents/Portals
C:Portals
C:Society
C:Ctgs. by topic
Note Top 10 articles on Wikipedia with highest PageRank
David F. Gleich (Sandia) Sensitivity Purdue 11 / 36
42

More Related Content

Viewers also liked

Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveDavid Gleich
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 

Viewers also liked (20)

Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 

Similar to How does Google Google: A journey into the wondrous mathematics behind your favorite websites

Aprendo las tablas de multiplicar
Aprendo las tablas de multiplicarAprendo las tablas de multiplicar
Aprendo las tablas de multiplicarKúbico Animación
 
Perkalian kelas 2
Perkalian kelas 2Perkalian kelas 2
Perkalian kelas 2Ven Dot
 
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)Ivan Corneillet
 
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
2016 05-25- HPEDSB Making Math Contextual, Visual and ConcreteKyle Pearce
 
12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theory12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theoryNigel Simmons
 
Sexy Maths
Sexy Maths Sexy Maths
Sexy Maths sam ran
 
maths easy
maths easymaths easy
maths easysam ran
 
RedDot Ruby Conf 2014 - Dark side of ruby
RedDot Ruby Conf 2014 - Dark side of ruby RedDot Ruby Conf 2014 - Dark side of ruby
RedDot Ruby Conf 2014 - Dark side of ruby Gautam Rege
 
St Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 friSt Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 friNICOLEWHITE118
 
2º tablas-multiplicar-mini
2º tablas-multiplicar-mini2º tablas-multiplicar-mini
2º tablas-multiplicar-minicarolian4
 
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptxG10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptxKirbyRaeDiaz2
 
The lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docxThe lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docxoreo10
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithmsbigdata trunk
 
Lesson 1 solving linear equations
Lesson 1   solving linear equationsLesson 1   solving linear equations
Lesson 1 solving linear equationsAngela Phillips
 
Multiplication
MultiplicationMultiplication
Multiplicationhiratufail
 
Multiplication
MultiplicationMultiplication
Multiplicationmsnancy
 

Similar to How does Google Google: A journey into the wondrous mathematics behind your favorite websites (20)

Math 5
Math 5 Math 5
Math 5
 
Aprendo las tablas de multiplicar
Aprendo las tablas de multiplicarAprendo las tablas de multiplicar
Aprendo las tablas de multiplicar
 
Perkalian kelas 2
Perkalian kelas 2Perkalian kelas 2
Perkalian kelas 2
 
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
 
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
 
12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theory12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theory
 
Sexy Maths
Sexy Maths Sexy Maths
Sexy Maths
 
maths easy
maths easymaths easy
maths easy
 
RedDot Ruby Conf 2014 - Dark side of ruby
RedDot Ruby Conf 2014 - Dark side of ruby RedDot Ruby Conf 2014 - Dark side of ruby
RedDot Ruby Conf 2014 - Dark side of ruby
 
St Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 friSt Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 fri
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
2º tablas-multiplicar-mini
2º tablas-multiplicar-mini2º tablas-multiplicar-mini
2º tablas-multiplicar-mini
 
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptxG10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
 
Skills ii
Skills iiSkills ii
Skills ii
 
The lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docxThe lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docx
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithms
 
Yr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptxYr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptx
 
Lesson 1 solving linear equations
Lesson 1   solving linear equationsLesson 1   solving linear equations
Lesson 1 solving linear equations
 
Multiplication
MultiplicationMultiplication
Multiplication
 
Multiplication
MultiplicationMultiplication
Multiplication
 

More from David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 

More from David Gleich (8)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

How does Google Google: A journey into the wondrous mathematics behind your favorite websites

  • 1. How Does Google? ! ! David F. Gleich! Computer Science! Purdue University! A journey into the wondrous mathematics behind your favorite websites 1
  • 2. Mathematics underlies an enormous number of the websites we use everyday! 2
  • 3. 1.  ‘s PageRank 2.  Multi-armed bandits and internet experiments 3
  • 4. 4
  • 5. Larry Page ! Sergey Brin! •  Created a web-search algorithm called “backrub” •  Spun-off a company “Googol” based on the paper •  The importance of a page is determined by the importance of pages that link to it. Lawrence Page, Sergey Brin, Rajeev Motwani,Terry Winograd “The PageRank Citation Ranking: Bringing Order to the Web” TR, Stanford InfoLab, 1999 5
  • 6. A websearch primer 1.  Crawl webpages 2.  Analyze webpage text (information retrieval) 3.  Analyze webpage links 4.  Fit over 200 measures to human evaluations 5.  Produce rankings 6.  Continuously update 6
  • 7. Pages, nodes, incoming links, outgoing links, and “importance” 7 “Important” pages that link to me! c b a “Important” pages that link to Purdue!
  • 8. 8
  • 9. Tim Davis andYifan Hu Sparse Matrix Gallery
  • 10. http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html 1000 vertices on 8.5-by-11 paper 1,000,000,000,000 vertices (one trillion) Paper the size of Manhattan island ! (23 sq miles)? The web 10
  • 11. We need something better! 11
  • 12. A wee web-graph: link counting is too easy to game! 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 12
  • 13. A wee web-graph: link counting is too easy to game! 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 The importance of a page is determined by the importance of pages that link to it. x1 = 0 x2 = 1 3 x1 x3 = 1 3 x1 + 1 2 x2 x4 = 1 3 x1 + x3 + x5 x5 = x4 x6 = 1 2 x2 13
  • 14. The importance of a page is determined by the importance of pages that link to it xi = X j2Bi 1 dj xj “Back-links from page i” Why it was called Backrub! “Importance” of page i “Importance” of page j Number of links page j uses! out-degree in graph theory x3 = 1 3 x1 + 1 2 x2 1 2 3 1/3 1/2 14
  • 15. We can rewrite this equation in a more mathematically convenient way 1 1 2 3 4 5 6 2 1 2 3 4 5 6 3 1 2 3 4 5 6 4 1 2 3 4 5 6 5 1 2 3 4 5 6 6 1 2 3 4 5 6 x 0 x 0 x 0 x 0 x 0 x 0 x 1 x x 0 x 0 x 0 x 0 x 0 x 3 1 1 x x x 0 x 0 x 0 x 0 x 3 2 1 x x 0 x 1x 0 x 1x 0 x 3 x 0 x 0 x 0 x 1x 0 x 0 x 1 x 0 x x 0 x 0 x 0 x 0 x 2 = + + + + + = + + + + + = + + + + + = + + + + + = + + + + + = + + + + + 15
  • 16. 1 1 2 2 3 3 4 4 5 5 6 6 x x0 0 0 0 0 0 x x1/ 3 0 0 0 0 0 x x1/ 3 1/ 2 0 0 0 0 or x x1/ 3 0 1 0 1 0 x x0 0 0 1 0 0 x x0 1/ 2 0 0 0 0 ⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎣ ⎦⎣ ⎦ ⎣ ⎦ x = Px And even more conveniently! Element k in column m = "probability" of going from node m to node k 16
  • 17. The matrix P for websites shows a lot of structure Every dot is a non-zero element indicating a link Matrices are sparse, and generally with block structure block structure can be explored to speed up ranking algorithm 17
  • 18. But this idea doesn’t work for the wee web-graph 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 Nodes 1, 4 and 5 determine everything! x1 = 0 x2 = 1 3 x1 x3 = 1 3 x1 + 1 2 x2 x4 = 1 3 x1 + x3 + x5 x5 = x4 x6 = 1 2 x2 x1 = 0 x2 = 1 3 x1 = 0 x3 = 1 3 x1 + 1 2 x2 = 0 x4 = 1 3 x1 + x3 + x5 = x5 x5 = x4 x6 = 1 2 x2 = 0 18
  • 19. But this idea doesn’t work for the wee web-graph 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 Node 1 ! “lonely” Nodes 4 and 5 ! “mutual admiration societies” Node 6 “anti-social” These nodes need to be “fixed” to get a reliable and useful ranking! 19
  • 20. The gang of four to the rescue Andrei Markov Oscar Perron Georg Frogenius Richard ! von Mises 20
  • 21. Let’s fix it up and force node 6 to choose, or link to everyone 1 2 3 4 5 6 P = 2 6 6 6 6 6 6 4 0 0 0 0 0 0 1/3 0 0 0 0 0 1/3 1/2 0 0 0 0 1/3 0 1 0 1 0 0 0 0 1 0 0 0 1/2 0 0 0 0 3 7 7 7 7 7 7 5 P = 2 6 6 6 6 6 6 4 0 0 0 0 0 1/6 1/3 0 0 0 0 1/6 1/3 1/2 0 0 0 1/6 1/3 0 1 0 1 1/6 0 0 0 1 0 1/6 0 1/2 0 0 0 1/6 3 7 7 7 7 7 7 5 21
  • 22. Taxation is the way to representation! c b a If is a good page, then it’ll still be a good page if we “tax” the importance from a, b, and c We can redistribute the taxed amounts to all including lonely nodes! 22
  • 23. The importance of a page is determined by the importance of pages that link to it* * After tax and any benefits The total importance that page j ! contributes to page i Benefits to page i The taxation rate of all xi = X j2Bi ↵ xj dj + (1 ↵)bi 23
  • 24. x1 x2 x3 x4 x5 x6 ! " # # # # # # # # # $ % & & & & & & & & & = α 0 0 0 0 0 1/ 6 1/ 3 0 0 0 0 1/ 6 1/ 3 1/ 2 0 0 0 1/ 6 1/ 3 0 1 0 1 1/ 6 0 0 0 1 0 1/ 6 0 1/ 2 0 0 0 1/ 6 ! " # # # # # # # $ % & & & & & & & x1 x2 x3 x4 x5 x6 ! " # # # # # # # # # $ % & & & & & & & & & +(1− α) b1 b2 b3 b4 b5 b6 ! " # # # # # # # # # $ % & & & & & & & & & Perron and Frobenius showed the new equation always has a unique solution x = ↵Px + (1 ↵)b 24
  • 25. 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 What von Mises and Richardson showed is that guess, check, and correct works! x(new) = ↵Px(old) + (1 ↵)b x(start) = 2 6 6 6 6 6 6 4 0.17 0.17 0.17 0.17 0.17 0.17 3 7 7 7 7 7 7 5 x(1) = 2 6 6 6 6 6 6 4 0.05 0.10 0.17 0.38 0.19 0.12 3 7 7 7 7 7 7 5 x(2) = 2 6 6 6 6 6 6 4 0.04 0.06 0.10 0.36 0.36 0.08 3 7 7 7 7 7 7 5 x(1) = 2 6 6 6 6 6 6 4 0.03 0.04 0.06 0.43 0.39 0.05 3 7 7 7 7 7 7 5 25
  • 26. 26
  • 27. There’s still a lot of work left to do to make a search engine Make it fast! Watch out for spam Watch out for manipulation Personalize Experiment! 27
  • 28. 1.  ‘s PageRank 2.  Multi-armed bandits and internet experiments 28
  • 30. http://upload.wikimedia.org/wikipedia/en/8/82/Las_Vegas_slot_machines.jpg This! Pays out ! $0.92/ dollar Pays out ! $0.98/ dollar Pays out ! $0.95/ dollar Pays out ! $0.99/ dollar 30
  • 31. What in the heck does a multi-armed bandit have to do with Google? 31
  • 32. What in the heck does a multi-armed bandit have to do with Google? Pays out ! $0.92/ view Pays out ! $0.66/ view Pays out ! $0.91/ view to show ads Pays out ! -$0.02/view hide ads 32
  • 33. How to optimize your website without exploiting the bandits Try condition A 100 times, find 45 “wins” Try condition B 100 times, find 85 “wins” Try condition C 100 times, find 10 “wins” … Choose the best! 33
  • 34. This field has some of the best terminology Explore ! Exploit ! Regret 34
  • 35. This field has some of the best terminology Explore – Visiting Las Vegas! Exploit – Your new winning strategy! Regret – That you didn’t quit after winning the first round 35
  • 36. This field has some of the best terminology Explore – Testing slot machines/ experiments for their reward Exploit – Playing the best reward you’ve found so far Regret – How much you lost due ! to exploration 36
  • 37. How to optimize your website without exploiting the bandits Try condition A 100 times, find 45 “wins” Try condition B 100 times, find 85 “wins” Try condition C 100 times, find 10 “wins” … Choose the best! Pure exploration! We only exploit our findings at the end! 37
  • 38. How to optimize your website exploiting the bandits Try condition A 5 times, find 4 wins! Try condition B 5 times, find 4 wins! Try condition C 5 times, find 2 wins Try condition A 7 times, find 3 wins! Try condition B 7 times, find 5 wins! Try condition C 1 time, find 0 wins Pure exploration! Exploit our knowledge Condition A B C Est. Return 0.58 0.75 0.33 38
  • 39. The goal of these problems is to construct optimal strategies to minimize regret Regret how much you left “on the table” by exploring zero-regret strategy is one where regret(T trials) is sublinear in T! as the number of plays T → ∞ E[play best always plays made based on data] regret 100-each 255/300 140/300 = 0.38 regret 30-mixed 25.5/30 0.45 ⇥ 12 + 0.85 ⇥ 12 + 0.1 ⇥ 6 = 0.31 39
  • 40. [The bandit problem] was formulated during the [second world] war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage. Peter Whittle (Whittle, 1979) Discussion of “Bandit processes and dynamical allocation indices” Their importance to website optimization, advertising, and recommendation has rejuvenated research on these problems with fascinating new questions. 40
  • 41. Math is everywhere and especially your favorite websites! Matrices and probability are key ingredients. 41
  • 42. PageRank on Wikipedia = 0.50 United States C:Living people France Germany England United Kingdom Canada Japan Poland Australia = 0.85 United States C:Main topic classif. C:Contents C:Living people C:Ctgs. by country United Kingdom C:Fundamental C:Ctgs. by topic C:Wikipedia admin. France = 0.99 C:Contents C:Main topic classif. C:Fundamental United States C:Wikipedia admin. P:List of portals P:Contents/Portals C:Portals C:Society C:Ctgs. by topic Note Top 10 articles on Wikipedia with highest PageRank David F. Gleich (Sandia) Sensitivity Purdue 11 / 36 42