SlideShare a Scribd company logo
1 of 8
Download to read offline
Computer Science and Engineering, UCSD                                                October 12, 1999
Tail Inequalities                                                                     Author: Bellare


                                     Tail Inequalities



1 The problem
We have a collection X1 ; : : : ; Xn of random variables each ranging between 0 and 1. We let pi =
E Xi for i = 1; : : : ; n and we let X = X1 +    + Xn. We let  = E X . Linearity of expectation
tells us that  = p1 +    + pn . We x some parameter A 0 and are interested in the probability
that X ,   A, namely that X exceeds its expectation by some amount A.
A particular form in which we wish to study this probability is the following. We let A = x for
some x 0. Then Pr X ,   A = Pr X  1 + x . We are interested in how this behaves
as a function of x, with all other quantities being xed. Mostly we want good upper bounds.
This situation arises extremely often. Typically, something is known about the amount of indepen-
dence of the random variables X1 ; : : : ; Xn . The simplest case is that they are actually independent.
Another case common in computer science is that they satisfy some limited form of independence,
for example pairwise independence, or, more generally, t-wise independence where t  2 is some
integer. When t = n we have independence. Alternatively, they may satisfy some form of almost
independence. Tail inequalities deal with these situations.
In mathematics courses on introductory probability theory, these problems are typically treated
via the laws of large numbers and the central limit theorem. These provide qualitative under-
standing of how the probabilities in question behave as a function of n. Tail inequalities are the
quantitative analogue.
We will begin with some background and then go to the most common case, the one where the
random variables are fully independent. Then we address limited independence.

2 Basic inequalities
The most basic inequality is Markov's.
Proposition 1 Markov's Inequality For any non-negative random variable X and any real
number a 0 we have
                             Pr X  a  E X :            a
As an example let a = 2  E X . Then the above says Pr X  2  E X  1=2. Namely, if you
move out to twice the expectation, you can have only half the area under the curve to your right.
This is quite intuitive.

                                                   1
2                                                                                           Bellare


Proof of Proposition 1: This is a simple computation:
                                               X
                        a  Pr X  a = a           Pr X = x
                                                          x : xa
                                                      X
                                                              x  Pr X = x
                                                     x : xa
                                                     X
                                                         x  Pr X = x
                                                      x
                                             = EX :
Where in this proof did we use that X  0? Third line above.
Markov's inequality is rather weak. The curious thing is that nonetheless it is in the end the root
of the most powerful tail inequalities around. You just have to nd the right way to use it.
One step up from Markov's inequality is Chebyschev's inequality. To state it we rst recall that if
X is a random variable then its variance is Var X = E X , 2 = E X 2 , 2 where  = E X
                                                                          

is the expectation of X .
Proposition 2 Chebychev's inequality Let X be a random variable, and let A 0. Then
                         Pr jX , j  A  Var X :                   A2




Proof of Proposition 2: Let  = E X . Let Y be the random variable de ned by Y = X , =      2

X , 2  X +  . Then
    2           2
                        h   i                      h    i
              E Y = E X , 2  E X +  = E X , 2 +  = Var X :
                             2                   2              2        2    2


Since Y  0 we can apply Proposition 1 to it. We have
                                                      h     i
                             Pr jX , j  A = Pr Y  A                   2




                                                 EY           A 2


                                                      = Var 2X
                                                          A
as desired.
This will come in useful for tail inequalities on pairwise independent random variables.

3 Tail inequalities for independent random variables
We have independent random variables X1 ; : : : ; Xn ranging between 0 and 1. For simplicity let's
assume they are actually boolean, meaning assume only the values 0 and 1. This turns out to
be the worse case for the situations we deal with. In that case, pi def E Xi = Pr Xi = 1 for
                                                                     =
Tail Inequalities                                                                                 3


i = 1; : : : ; n. As usual set X = X +    + Xn and  = E X . We are interested in upper bounding
                                     1
Pr X ,  A where A 0 is some given real number.
The law of large numbers says that if we x A; p then
                                        h                      i
                                  lim Pr n n Xi , pi  A = 0 :
                                           1
                                             P
                                 n!1           i  =1

That is, the probability that X deviates from its expectation gets smaller and smaller as the number
of samples n grows. In computer science we want more precise information: our interest is in how
this probability tails of as a function of n.
Nomenclature in this area is not uniform, but the bounds we will now discuss sometimes go under
the name of Cherno -type bounds. A good reference is 1 .
The rst bound we specify is the simplest, yet good enough in many of the applications.

Proposition 3 Let X ; : : : ; Xn be independent, 0=1 valued random variables, and let pi = E Xi
                        1
for i = 1; : : : ; n. Let X = X +    + Xn and let  = E X . Let A 0 be a real number. Then
                                1


                                     Pr X ,  A            e,A2 = n :
                                                                  2




We won't prove this because below we will prove something stronger. But let's discuss it. To get
an understanding of the bound we consider the case where p1 = p2 =    = pn .

Corollary 4 Let X ; : : : ; Xn be independent, 0=1 valued random variables all having the same
                    1
expectation p. Let X = X +    + Xn and let  = E X . Let x 0 be a real number. Then
                            1


                                    Pr X       1 + x    e,x2 p2n=     2


                                                          = e,x2 p= :2




Proof: Set A = x in Proposition 3 and use the fact that  = pn.

This shows us the punch-line: roughly, Pr X 1 + x decreases exponentially with n for xed
x; p. In other words, the probability that the sum of independent random variables deviates signif-
icantly from its mean expectation drops very quickly as the number of random variables grows.
For example if you toss many fair coins, the probability of getting signi cantly more than 50
heads is very small.
However you have to be careful with the above bound. The intuition that Pr X 1 + x
decreases exponentially with n is quite sensitive to the values of x; p, and there are many common
situations in which the bounds obtained from the above are not good enough. One way to see why
is to consider the second line in the bound of Corollary 4, which we got just by substituting  for
np in the rst line. It shows us that if p is small, the bound is worse even for a xed value of the
expectation  of the sum X . This is actually a weakness in the bound, not necessarily a re ection
of reality.
Here now is a stronger Cherno -type bound. This is pretty much tight, meaning as good as you
can get. It may at rst be hard to interpret, but we'll elucidate it later.
4                                                                                                      Bellare


Theorem 5 Let X ; : : : ; Xn be independent, 0=1 valued random variables, and let pi = E Xi for
                   1
i = 1; : : : ; n. Let X = X +    + Xn and let  = E X . Let
                        1                                      1 be a real number. Then
                                       Pr X          e,g          


where we de ne the function g by g  = ln  + 1 , .
To visualize this it is again useful to set = 1 + x for x                 0 and see what happens to the bound
viewed as a function of x.
Corollary 6 Let X ; : : : ; Xn be independent, 0=1 valued random variables, and let pi = E Xi for
                    1
i = 1; : : : ; n. Let X = X +    + Xn and let  = E X . Let 0 x  2 be a real number. Then
                        1

                                 Pr X 1 + x    e, x2 = :        3       10



Notice the improvement over Corollary 4: the factor of p in the exponent has vanished. That is
quite a change; the new bound is much better.
Be careful when you apply this bound to note that it only holds for x  2. If x is larger, our
intuition is that the probability in question should be even lower, yet the bound above does not
apply. If you want a bound that works for large x you will need to go back to the proofs of
Theorem 5 and Corollary 6 and try to extend them. It would be nice to do this and get a clean
bound for larger x, actually. We might explore these issues later, but right now I want to look more
at the proofs. Let's begin by seeing why Corollary 6 follows from Theorem 5.
Proof of Corollary 6: Theorem 5 tells us that
                              Pr X 1 + x    e,g x                  1+ 


where g is the function de ned in the statement of Theorem 5. So it su ces to show that
g1 + x  3x =10 for 0 x  2. We do this using Taylor series approximations. We have
             2


                      g1 + x = 1 + x ln1 + x + 1 , 1 + x
                               = 1 + x ln1 + x , x
                                                               i
                               = ,x + 1 + x  ,1i,  x
                                                X
                                                                          1
                                                              i
                                                            i1
                                                                  i                       i
                                                    ,1i,1  x +                         x
                                              X                           X
                                 = ,x +                       i                 ,1i  i , 1
                                              i1                         i2

                                 =
                                     X                 xi        x
                                         ,1i,1  i + ,1i  i , 1
                                                                                 i
                                     i2
                                                        i
                                           ,1i  ii x 1
                                     X
                                 =                     ,
                                     i2
                                                      i
                                 = x , ,1i,1  ii x 1
                                    2 X

                                   2 i3             ,
                                                                     !

                                  x , x ,x +x :
                                      2          3       4        5

                                   2   6 12 20
Tail Inequalities                                                                                    5


We now want to upper bound the expression in parentheses in the last line above. We consider
the function f x = 1=6 , x=12 + x2 =20. It attains its minimum at x = 5=6 so for 0 x  2 the
maximum value of f is attained at x = 2. Thus the above is
                                                   x , x  2  f 2
                                                       2
                                                               2
                                                    2
1 , 1
                                                  = x  2 5
                                                       2



                                                     x2
                                                  = 310 :
That concludes the proof.
Now we come to the interesting part, namely the proof of Theorem 5. It introduces the idea of
exponential generating functions. It is quite neat, illustrating many simple but powerful techniques.

Proof of Theorem 5: We introduce a parameter  0 whose value will be set later. Recall that
 = p +    + pn = E X . We use the monotonicity of the exponential function and then apply
      1
Markov's inequality to get
                                                                   h                            i
                                 Pr X               = Pr eX e                          
                                                          h   i
                                                        E eX
                                                       e  :                                     1
This is the exponential generating function trick. We do something that looks really trivial. We
note that the probability is unchanged if we exponentiate the terms involved, and then we use, of
all things, the weakest of the inequalities around, namely Markov's inequality. Yet as we will now
see, rather strong bounds emerge.
The next thing we do is bound the expectation in Equation 1. We start with the following
                                  h       i            h                               i
                                 E eX = E e X1 X2  Xn    +       +   +       

                                          h                          i
                                       = E eX1  eX2  : : :  eXn :
The independence of X1 ; : : : ; Xn this is the one and only place we use this implies that the above
equals
                                   h          i    h       i               h           i
                                 E eX1  E eX2  : : :  E eXn :
Now we compute these individual expectations: For any i = 1; : : : ; n we have
                             h        i
                           E eXi = 1  Pr Xi = 0 + e  Pr Xi = 1
                                  = 1 , pi  + e  pi
                                  = 1 + e , 1  pi :
So at this point we have
                h   i
              E eX = 1 + e , 1  p  1 + e , 1  p  : : :  1 + e , 1  pn :
                                                   1                           2
6                                                                                                 Bellare


Notice that so far we have done no bounding; we have equalities.
At this point, what can we do? We are looking at a complex bound, product of many terms. We
will start seeing terms involving products of the values p1 ; : : : ; pn , which is not something we know
much about. What we do know something about is the sum of p1 ; : : : ; pn , because this is exactly
, the expectation of X . We'd like to work this in. This is done by applying a very common and
useful little inequality, namely that 1 + y  ey for any real number y. Set yi = e , 1pi and we
get
                              h    i
                            E eX  1 + y   1 + y   : : :  1 + yn
                                                     1                2


                                   ey1  ey2  : : :  eyn
                                       =   ey1  yn
                                                +   +


                                       =   e e, p1 
                                                   1   +   +pn 
                                       =   e e,  :
                                                   1



Let's now put this back together with Equation 1. That gives us

                                  Pr X                e e, 
                                                   e 
                                                                         1




                                                    = e e , , 
                                                                         1


                                                    = e,f           



where
                                        f  =  , e , 1 :
Now we want to analyze the function f  and choose the value of  0 that makes f as large as
possible. Since all the above is true for any value of  0 we can plug in this special value and that
will be our bound. To analyze f  we use high-school level calculus. We compute the derivate:
f 0 = , e . The function f 0 is positive for  ln , zero at  = ln , and then negative for
 ln . This tells us that f is increasing for 0  ln  and decreasing for  ln . So the
maximum is at  = ln . Now we note that
                                    f ln  = ln   , + 1 :
This is exactly what we called g , so the proof is complete.
The technique of this proof is the one used in all proofs of Cherno -type bounds, with minor
variations. It is useful to know it so that you can derive your own bounds if necessary.

4 Tail inequalities for pairwise independent random variables
Pairwise independence means that we cannot infer anything extra about Xi given Xj where j 6= i,
even though we might be able to infer something or even everything about Xi if we were given Xj
and Xk where i; j; k are distinct.
Tail Inequalities                                                                                                                     7


De nition 7 We say that X ; : : : ; Xn are pairwise independent random variables if for every 1 
                              1
i j  n and every a; b 2 R we have
                 Pr Xi = a and Xj = b = Pr Xi = a  Pr Xj = b :
The tail inequality for such random variables makes use of the fact that the variance of a sum of
pairwise independent random variables behaves exactly like the variance of a sum of independent
random variables: it is the sum of the individual variances.
Lemma 8 Let X ; : : : ; Xn be pairwise independent random variables. Then
                    1

                     Var X +    + Xn = Var X +    + Var Xn :
                              1                                                 1




Proof of Lemma 8: Use the formula for the variance and the linearity of expectation to get
                           h                 i
    Var X +    + Xn = E X +    + Xn , E X +    + Xn
             1                           1
                                                                        2
                                                                                           1
                                                                                                            2



                       = E X +    + Xn X +    + Xn  , E X +    + E Xn 
                                         1                                      1                               1
                                                                                                                                  2

                           hP         i   X
                       = E i;j Xi Xj , E Xi  E Xj
                                                                      i;j
                                  X                            X
                          =             E XiXj ,                       E Xi  E Xj
                                  i;j                           i;j
                                  X      h           i       X                         X                    X
                          =             E Xi +   2
                                                                    E Xi Xj ,                      E Xi ,
                                                                                                      2
                                                                                                                    E Xi  E Xj
                                   i                         i6=j                              i            i6=j
                                  X
         h           i                         X
                          =             E Xi , E Xi +2                      2
                                                                                           E Xi Xj , E Xi  E Xj 
                                   i                                                i6=j
                                  X                           X
                          =             Var Xi +                      E Xi Xj , E Xi  E Xj  :
                                   i                          i6=j
The pairwise independence means that E Xi Xj = E Xi  E Xj whenever i 6= j . Thus the second
sum above is zero, and we are done.
We can now obtain the tail inequality by applying Chebyshev's inequality.
Lemma 9 Let X ; : : : ; Xn be pairwise independent random variables, let X = X +    + Xn, let
                    1                                                                                                    1
A 0 be a real number, and let  = E X . Then
                     Pr jX , j A  Var X +    + Var Xn :                    1

                                                                                     A 2




Proof of Lemma 9: Proposition 2 tells us that
                                   Pr jX , j A                              Var X :
                                                                                A              2


Now apply Lemma 8.

More Related Content

What's hot

Lesson 6: Limits Involving Infinity (slides)
Lesson 6: Limits Involving Infinity (slides)Lesson 6: Limits Involving Infinity (slides)
Lesson 6: Limits Involving Infinity (slides)Matthew Leingang
 
Lesson 13: Exponential and Logarithmic Functions (slides)
Lesson 13: Exponential and Logarithmic Functions (slides)Lesson 13: Exponential and Logarithmic Functions (slides)
Lesson 13: Exponential and Logarithmic Functions (slides)Matthew Leingang
 
The Euclidean Spaces (elementary topology and sequences)
The Euclidean Spaces (elementary topology and sequences)The Euclidean Spaces (elementary topology and sequences)
The Euclidean Spaces (elementary topology and sequences)JelaiAujero
 
Real and convex analysis
Real and convex analysisReal and convex analysis
Real and convex analysisSpringer
 
Polynomial functions modelllings
Polynomial functions modelllingsPolynomial functions modelllings
Polynomial functions modelllingsTarun Gehlot
 
Lesson 6: Limits Involving ∞
Lesson 6: Limits Involving ∞Lesson 6: Limits Involving ∞
Lesson 6: Limits Involving ∞Matthew Leingang
 
Lesson 5: Continuity (slides)
Lesson 5: Continuity (slides)Lesson 5: Continuity (slides)
Lesson 5: Continuity (slides)Matthew Leingang
 
Ssp notes
Ssp notesSsp notes
Ssp notesbalu902
 
Cs229 notes8
Cs229 notes8Cs229 notes8
Cs229 notes8VuTran231
 
Math project
Math projectMath project
Math projectphyatt216
 
Tips manejo de 1er parcial fula metodos numericos 2010
Tips manejo de 1er parcial   fula metodos numericos 2010Tips manejo de 1er parcial   fula metodos numericos 2010
Tips manejo de 1er parcial fula metodos numericos 2010HernanFula
 
Thesis defendence presentation
Thesis defendence presentationThesis defendence presentation
Thesis defendence presentationRobin Pokorny
 
Probability Basics and Bayes' Theorem
Probability Basics and Bayes' TheoremProbability Basics and Bayes' Theorem
Probability Basics and Bayes' TheoremMenglinLiu1
 

What's hot (19)

Lesson 6: Limits Involving Infinity (slides)
Lesson 6: Limits Involving Infinity (slides)Lesson 6: Limits Involving Infinity (slides)
Lesson 6: Limits Involving Infinity (slides)
 
Lesson 13: Exponential and Logarithmic Functions (slides)
Lesson 13: Exponential and Logarithmic Functions (slides)Lesson 13: Exponential and Logarithmic Functions (slides)
Lesson 13: Exponential and Logarithmic Functions (slides)
 
The Euclidean Spaces (elementary topology and sequences)
The Euclidean Spaces (elementary topology and sequences)The Euclidean Spaces (elementary topology and sequences)
The Euclidean Spaces (elementary topology and sequences)
 
Real and convex analysis
Real and convex analysisReal and convex analysis
Real and convex analysis
 
3 fol examples v2
3 fol examples v23 fol examples v2
3 fol examples v2
 
Polynomial functions modelllings
Polynomial functions modelllingsPolynomial functions modelllings
Polynomial functions modelllings
 
Lesson 6: Limits Involving ∞
Lesson 6: Limits Involving ∞Lesson 6: Limits Involving ∞
Lesson 6: Limits Involving ∞
 
Lesson 5: Continuity (slides)
Lesson 5: Continuity (slides)Lesson 5: Continuity (slides)
Lesson 5: Continuity (slides)
 
Alg grp
Alg grpAlg grp
Alg grp
 
Ssp notes
Ssp notesSsp notes
Ssp notes
 
Convergence
ConvergenceConvergence
Convergence
 
Cs229 notes8
Cs229 notes8Cs229 notes8
Cs229 notes8
 
Math project
Math projectMath project
Math project
 
Lecture5
Lecture5Lecture5
Lecture5
 
Lecture1
Lecture1Lecture1
Lecture1
 
Tips manejo de 1er parcial fula metodos numericos 2010
Tips manejo de 1er parcial   fula metodos numericos 2010Tips manejo de 1er parcial   fula metodos numericos 2010
Tips manejo de 1er parcial fula metodos numericos 2010
 
Lemh105
Lemh105Lemh105
Lemh105
 
Thesis defendence presentation
Thesis defendence presentationThesis defendence presentation
Thesis defendence presentation
 
Probability Basics and Bayes' Theorem
Probability Basics and Bayes' TheoremProbability Basics and Bayes' Theorem
Probability Basics and Bayes' Theorem
 

Viewers also liked

Comparative Study That Aims Rdf Processing For The Java Platform
Comparative Study That Aims Rdf Processing For The Java PlatformComparative Study That Aims Rdf Processing For The Java Platform
Comparative Study That Aims Rdf Processing For The Java PlatformComputer Science
 
insomnia
insomniainsomnia
insomniaMelvin
 
Lazer Frenzy Brochure
Lazer Frenzy BrochureLazer Frenzy Brochure
Lazer Frenzy Brochuremlanuti
 
Amistat Petit Princep
Amistat Petit PrincepAmistat Petit Princep
Amistat Petit Princepguest65bcb79
 
Nature\'s Market Presentation
Nature\'s Market PresentationNature\'s Market Presentation
Nature\'s Market Presentationbcbertsch
 
Cosa Vuoi Di Più Dalla Vita
Cosa Vuoi Di Più Dalla VitaCosa Vuoi Di Più Dalla Vita
Cosa Vuoi Di Più Dalla Vitaguest2751498
 
actividad 3, prueba de la herramienta
actividad 3, prueba de la herramientaactividad 3, prueba de la herramienta
actividad 3, prueba de la herramientanat_ct
 
Classic Placement Slide
Classic Placement SlideClassic Placement Slide
Classic Placement SlideAtul Joshi
 
Day in the life of westside!
Day in the life of westside!Day in the life of westside!
Day in the life of westside!zsmithmccoyp7
 
Listening practice-tests
Listening practice-testsListening practice-tests
Listening practice-testsHaami Nepali
 
Que tiempos aquellos
Que tiempos aquellosQue tiempos aquellos
Que tiempos aquellosnenicha
 
Presentacion power point
Presentacion power pointPresentacion power point
Presentacion power pointManu1015
 
Cooperative Work Programme
Cooperative Work ProgrammeCooperative Work Programme
Cooperative Work ProgrammeNoel Hatch
 

Viewers also liked (17)

Comparative Study That Aims Rdf Processing For The Java Platform
Comparative Study That Aims Rdf Processing For The Java PlatformComparative Study That Aims Rdf Processing For The Java Platform
Comparative Study That Aims Rdf Processing For The Java Platform
 
insomnia
insomniainsomnia
insomnia
 
Revista Do Invasor Janeiro 2010
Revista Do Invasor Janeiro 2010Revista Do Invasor Janeiro 2010
Revista Do Invasor Janeiro 2010
 
Lazer Frenzy Brochure
Lazer Frenzy BrochureLazer Frenzy Brochure
Lazer Frenzy Brochure
 
Amistat Petit Princep
Amistat Petit PrincepAmistat Petit Princep
Amistat Petit Princep
 
Nature\'s Market Presentation
Nature\'s Market PresentationNature\'s Market Presentation
Nature\'s Market Presentation
 
Cosa Vuoi Di Più Dalla Vita
Cosa Vuoi Di Più Dalla VitaCosa Vuoi Di Più Dalla Vita
Cosa Vuoi Di Più Dalla Vita
 
Twet
TwetTwet
Twet
 
actividad 3, prueba de la herramienta
actividad 3, prueba de la herramientaactividad 3, prueba de la herramienta
actividad 3, prueba de la herramienta
 
Classic Placement Slide
Classic Placement SlideClassic Placement Slide
Classic Placement Slide
 
Getting Started
Getting StartedGetting Started
Getting Started
 
Day in the life of westside!
Day in the life of westside!Day in the life of westside!
Day in the life of westside!
 
Listening practice-tests
Listening practice-testsListening practice-tests
Listening practice-tests
 
Que tiempos aquellos
Que tiempos aquellosQue tiempos aquellos
Que tiempos aquellos
 
Presentacion power point
Presentacion power pointPresentacion power point
Presentacion power point
 
Nutrución en cirugía
Nutrución en cirugíaNutrución en cirugía
Nutrución en cirugía
 
Cooperative Work Programme
Cooperative Work ProgrammeCooperative Work Programme
Cooperative Work Programme
 

Similar to Tail

Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsMalik Sb
 
Intro probability 3
Intro probability 3Intro probability 3
Intro probability 3Phong Vo
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceasimnawaz54
 
Random variables
Random variablesRandom variables
Random variablesMenglinLiu1
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4Phong Vo
 
Chapter 01 - p2.pdf
Chapter 01 - p2.pdfChapter 01 - p2.pdf
Chapter 01 - p2.pdfsmarwaneid
 
CMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & QuantifiersCMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & Quantifiersallyn joy calcaben
 
Interpolation techniques - Background and implementation
Interpolation techniques - Background and implementationInterpolation techniques - Background and implementation
Interpolation techniques - Background and implementationQuasar Chunawala
 
Intro probability 2
Intro probability 2Intro probability 2
Intro probability 2Phong Vo
 
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)Matthew Leingang
 
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)Mel Anthony Pepito
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2BarryK88
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Similar to Tail (20)

Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
Intro probability 3
Intro probability 3Intro probability 3
Intro probability 3
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inference
 
polynomial
 polynomial  polynomial
polynomial
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Random variables
Random variablesRandom variables
Random variables
 
Ch5
Ch5Ch5
Ch5
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4
 
Chapter 01 - p2.pdf
Chapter 01 - p2.pdfChapter 01 - p2.pdf
Chapter 01 - p2.pdf
 
CMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & QuantifiersCMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & Quantifiers
 
Interpolation techniques - Background and implementation
Interpolation techniques - Background and implementationInterpolation techniques - Background and implementation
Interpolation techniques - Background and implementation
 
Excel Homework Help
Excel Homework HelpExcel Homework Help
Excel Homework Help
 
Intro probability 2
Intro probability 2Intro probability 2
Intro probability 2
 
Predicates Logic.pptx
Predicates Logic.pptxPredicates Logic.pptx
Predicates Logic.pptx
 
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
 
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
 
CH6.pdf
CH6.pdfCH6.pdf
CH6.pdf
 
Ch6
Ch6Ch6
Ch6
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Tail

  • 1. Computer Science and Engineering, UCSD October 12, 1999 Tail Inequalities Author: Bellare Tail Inequalities 1 The problem We have a collection X1 ; : : : ; Xn of random variables each ranging between 0 and 1. We let pi = E Xi for i = 1; : : : ; n and we let X = X1 + + Xn. We let = E X . Linearity of expectation tells us that = p1 + + pn . We x some parameter A 0 and are interested in the probability that X , A, namely that X exceeds its expectation by some amount A. A particular form in which we wish to study this probability is the following. We let A = x for some x 0. Then Pr X , A = Pr X 1 + x . We are interested in how this behaves as a function of x, with all other quantities being xed. Mostly we want good upper bounds. This situation arises extremely often. Typically, something is known about the amount of indepen- dence of the random variables X1 ; : : : ; Xn . The simplest case is that they are actually independent. Another case common in computer science is that they satisfy some limited form of independence, for example pairwise independence, or, more generally, t-wise independence where t 2 is some integer. When t = n we have independence. Alternatively, they may satisfy some form of almost independence. Tail inequalities deal with these situations. In mathematics courses on introductory probability theory, these problems are typically treated via the laws of large numbers and the central limit theorem. These provide qualitative under- standing of how the probabilities in question behave as a function of n. Tail inequalities are the quantitative analogue. We will begin with some background and then go to the most common case, the one where the random variables are fully independent. Then we address limited independence. 2 Basic inequalities The most basic inequality is Markov's. Proposition 1 Markov's Inequality For any non-negative random variable X and any real number a 0 we have Pr X a E X : a As an example let a = 2 E X . Then the above says Pr X 2 E X 1=2. Namely, if you move out to twice the expectation, you can have only half the area under the curve to your right. This is quite intuitive. 1
  • 2. 2 Bellare Proof of Proposition 1: This is a simple computation: X a Pr X a = a Pr X = x x : xa X x Pr X = x x : xa X x Pr X = x x = EX : Where in this proof did we use that X 0? Third line above. Markov's inequality is rather weak. The curious thing is that nonetheless it is in the end the root of the most powerful tail inequalities around. You just have to nd the right way to use it. One step up from Markov's inequality is Chebyschev's inequality. To state it we rst recall that if X is a random variable then its variance is Var X = E X , 2 = E X 2 , 2 where = E X is the expectation of X . Proposition 2 Chebychev's inequality Let X be a random variable, and let A 0. Then Pr jX , j A Var X : A2 Proof of Proposition 2: Let = E X . Let Y be the random variable de ned by Y = X , = 2 X , 2 X + . Then 2 2 h i h i E Y = E X , 2 E X + = E X , 2 + = Var X : 2 2 2 2 2 Since Y 0 we can apply Proposition 1 to it. We have h i Pr jX , j A = Pr Y A 2 EY A 2 = Var 2X A as desired. This will come in useful for tail inequalities on pairwise independent random variables. 3 Tail inequalities for independent random variables We have independent random variables X1 ; : : : ; Xn ranging between 0 and 1. For simplicity let's assume they are actually boolean, meaning assume only the values 0 and 1. This turns out to be the worse case for the situations we deal with. In that case, pi def E Xi = Pr Xi = 1 for =
  • 3. Tail Inequalities 3 i = 1; : : : ; n. As usual set X = X + + Xn and = E X . We are interested in upper bounding 1 Pr X , A where A 0 is some given real number. The law of large numbers says that if we x A; p then h i lim Pr n n Xi , pi A = 0 : 1 P n!1 i =1 That is, the probability that X deviates from its expectation gets smaller and smaller as the number of samples n grows. In computer science we want more precise information: our interest is in how this probability tails of as a function of n. Nomenclature in this area is not uniform, but the bounds we will now discuss sometimes go under the name of Cherno -type bounds. A good reference is 1 . The rst bound we specify is the simplest, yet good enough in many of the applications. Proposition 3 Let X ; : : : ; Xn be independent, 0=1 valued random variables, and let pi = E Xi 1 for i = 1; : : : ; n. Let X = X + + Xn and let = E X . Let A 0 be a real number. Then 1 Pr X , A e,A2 = n : 2 We won't prove this because below we will prove something stronger. But let's discuss it. To get an understanding of the bound we consider the case where p1 = p2 = = pn . Corollary 4 Let X ; : : : ; Xn be independent, 0=1 valued random variables all having the same 1 expectation p. Let X = X + + Xn and let = E X . Let x 0 be a real number. Then 1 Pr X 1 + x e,x2 p2n= 2 = e,x2 p= :2 Proof: Set A = x in Proposition 3 and use the fact that = pn. This shows us the punch-line: roughly, Pr X 1 + x decreases exponentially with n for xed x; p. In other words, the probability that the sum of independent random variables deviates signif- icantly from its mean expectation drops very quickly as the number of random variables grows. For example if you toss many fair coins, the probability of getting signi cantly more than 50 heads is very small. However you have to be careful with the above bound. The intuition that Pr X 1 + x decreases exponentially with n is quite sensitive to the values of x; p, and there are many common situations in which the bounds obtained from the above are not good enough. One way to see why is to consider the second line in the bound of Corollary 4, which we got just by substituting for np in the rst line. It shows us that if p is small, the bound is worse even for a xed value of the expectation of the sum X . This is actually a weakness in the bound, not necessarily a re ection of reality. Here now is a stronger Cherno -type bound. This is pretty much tight, meaning as good as you can get. It may at rst be hard to interpret, but we'll elucidate it later.
  • 4. 4 Bellare Theorem 5 Let X ; : : : ; Xn be independent, 0=1 valued random variables, and let pi = E Xi for 1 i = 1; : : : ; n. Let X = X + + Xn and let = E X . Let 1 1 be a real number. Then Pr X e,g where we de ne the function g by g = ln + 1 , . To visualize this it is again useful to set = 1 + x for x 0 and see what happens to the bound viewed as a function of x. Corollary 6 Let X ; : : : ; Xn be independent, 0=1 valued random variables, and let pi = E Xi for 1 i = 1; : : : ; n. Let X = X + + Xn and let = E X . Let 0 x 2 be a real number. Then 1 Pr X 1 + x e, x2 = : 3 10 Notice the improvement over Corollary 4: the factor of p in the exponent has vanished. That is quite a change; the new bound is much better. Be careful when you apply this bound to note that it only holds for x 2. If x is larger, our intuition is that the probability in question should be even lower, yet the bound above does not apply. If you want a bound that works for large x you will need to go back to the proofs of Theorem 5 and Corollary 6 and try to extend them. It would be nice to do this and get a clean bound for larger x, actually. We might explore these issues later, but right now I want to look more at the proofs. Let's begin by seeing why Corollary 6 follows from Theorem 5. Proof of Corollary 6: Theorem 5 tells us that Pr X 1 + x e,g x 1+ where g is the function de ned in the statement of Theorem 5. So it su ces to show that g1 + x 3x =10 for 0 x 2. We do this using Taylor series approximations. We have 2 g1 + x = 1 + x ln1 + x + 1 , 1 + x = 1 + x ln1 + x , x i = ,x + 1 + x ,1i, x X 1 i i1 i i ,1i,1 x + x X X = ,x + i ,1i i , 1 i1 i2 = X xi x ,1i,1 i + ,1i i , 1 i i2 i ,1i ii x 1 X = , i2 i = x , ,1i,1 ii x 1 2 X 2 i3 , ! x , x ,x +x : 2 3 4 5 2 6 12 20
  • 5. Tail Inequalities 5 We now want to upper bound the expression in parentheses in the last line above. We consider the function f x = 1=6 , x=12 + x2 =20. It attains its minimum at x = 5=6 so for 0 x 2 the maximum value of f is attained at x = 2. Thus the above is x , x 2 f 2 2 2 2
  • 6. 1 , 1 = x 2 5 2 x2 = 310 : That concludes the proof. Now we come to the interesting part, namely the proof of Theorem 5. It introduces the idea of exponential generating functions. It is quite neat, illustrating many simple but powerful techniques. Proof of Theorem 5: We introduce a parameter 0 whose value will be set later. Recall that = p + + pn = E X . We use the monotonicity of the exponential function and then apply 1 Markov's inequality to get h i Pr X = Pr eX e h i E eX e : 1 This is the exponential generating function trick. We do something that looks really trivial. We note that the probability is unchanged if we exponentiate the terms involved, and then we use, of all things, the weakest of the inequalities around, namely Markov's inequality. Yet as we will now see, rather strong bounds emerge. The next thing we do is bound the expectation in Equation 1. We start with the following h i h i E eX = E e X1 X2 Xn + + + h i = E eX1 eX2 : : : eXn : The independence of X1 ; : : : ; Xn this is the one and only place we use this implies that the above equals h i h i h i E eX1 E eX2 : : : E eXn : Now we compute these individual expectations: For any i = 1; : : : ; n we have h i E eXi = 1 Pr Xi = 0 + e Pr Xi = 1 = 1 , pi + e pi = 1 + e , 1 pi : So at this point we have h i E eX = 1 + e , 1 p 1 + e , 1 p : : : 1 + e , 1 pn : 1 2
  • 7. 6 Bellare Notice that so far we have done no bounding; we have equalities. At this point, what can we do? We are looking at a complex bound, product of many terms. We will start seeing terms involving products of the values p1 ; : : : ; pn , which is not something we know much about. What we do know something about is the sum of p1 ; : : : ; pn , because this is exactly , the expectation of X . We'd like to work this in. This is done by applying a very common and useful little inequality, namely that 1 + y ey for any real number y. Set yi = e , 1pi and we get h i E eX 1 + y 1 + y : : : 1 + yn 1 2 ey1 ey2 : : : eyn = ey1 yn + + = e e, p1 1 + +pn = e e, : 1 Let's now put this back together with Equation 1. That gives us Pr X e e, e 1 = e e , , 1 = e,f where f = , e , 1 : Now we want to analyze the function f and choose the value of 0 that makes f as large as possible. Since all the above is true for any value of 0 we can plug in this special value and that will be our bound. To analyze f we use high-school level calculus. We compute the derivate: f 0 = , e . The function f 0 is positive for ln , zero at = ln , and then negative for ln . This tells us that f is increasing for 0 ln and decreasing for ln . So the maximum is at = ln . Now we note that f ln = ln , + 1 : This is exactly what we called g , so the proof is complete. The technique of this proof is the one used in all proofs of Cherno -type bounds, with minor variations. It is useful to know it so that you can derive your own bounds if necessary. 4 Tail inequalities for pairwise independent random variables Pairwise independence means that we cannot infer anything extra about Xi given Xj where j 6= i, even though we might be able to infer something or even everything about Xi if we were given Xj and Xk where i; j; k are distinct.
  • 8. Tail Inequalities 7 De nition 7 We say that X ; : : : ; Xn are pairwise independent random variables if for every 1 1 i j n and every a; b 2 R we have Pr Xi = a and Xj = b = Pr Xi = a Pr Xj = b : The tail inequality for such random variables makes use of the fact that the variance of a sum of pairwise independent random variables behaves exactly like the variance of a sum of independent random variables: it is the sum of the individual variances. Lemma 8 Let X ; : : : ; Xn be pairwise independent random variables. Then 1 Var X + + Xn = Var X + + Var Xn : 1 1 Proof of Lemma 8: Use the formula for the variance and the linearity of expectation to get h i Var X + + Xn = E X + + Xn , E X + + Xn 1 1 2 1 2 = E X + + Xn X + + Xn , E X + + E Xn 1 1 1 2 hP i X = E i;j Xi Xj , E Xi E Xj i;j X X = E XiXj , E Xi E Xj i;j i;j X h i X X X = E Xi + 2 E Xi Xj , E Xi , 2 E Xi E Xj i i6=j i i6=j X h i X = E Xi , E Xi +2 2 E Xi Xj , E Xi E Xj i i6=j X X = Var Xi + E Xi Xj , E Xi E Xj : i i6=j The pairwise independence means that E Xi Xj = E Xi E Xj whenever i 6= j . Thus the second sum above is zero, and we are done. We can now obtain the tail inequality by applying Chebyshev's inequality. Lemma 9 Let X ; : : : ; Xn be pairwise independent random variables, let X = X + + Xn, let 1 1 A 0 be a real number, and let = E X . Then Pr jX , j A Var X + + Var Xn : 1 A 2 Proof of Lemma 9: Proposition 2 tells us that Pr jX , j A Var X : A 2 Now apply Lemma 8.
  • 9. 8 Bellare References 1 R. Motwani and P. Raghavan, Randomized algorithms, Cambridge University Press, 1995.