SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
Lecture 2
Barna Saha
AT&T-Labs Research
September 12, 2013
Outline
Concentration Inequalities Revisited
Universal Family of Hash Functions
Counting Distinct Items
Analysis of Algorithm from Lecture 0
AMS Algorithm for Counting Distinct Element
First and Second Moment Bounds
Markov Inequality For any positive random variable X and
t > 0
Pr X > t ≤
E X
t
Chebyshev Inequality For any random variable X and t > 0
Pr |X − E X | > t ≤
Var X
t2
The Chernoff Bound
Let X1, X2...Xn be n independent Bernoulli random variables
with Pr(Xi = 1) = pi . Let X = Xi . Hence,
E[X] = E Xi = E [Xi ] = Pr(Xi = 1) = pi = µ(say).
Then the Chernoff Bound says for any > 0
Pr(X > (1 + )µ) ≤
e
(1 + )
µ
and
Pr(X < (1 − )µ) ≤
e−
(1 − )1−
µ
When 0 < < 1 the above expression can be further
simplified to
Pr(X > (1 + )µ) ≤ e
−µ 2
3 and
Pr(X < (1 − )µ) ≤ e
−µ 2
2
Hence
Pr(|X − µ| > µ) ≤ 2e
−µ 2
3
Universal Hash Family
A family of hash functions H = {h |h : [N] − − > [M]} is called a
pairwise independent family of hash functions if for all i = j ∈ [N]
and any k, l ∈ [M]
Prh←H h(i) = k ∧ h(j) = l =
1
M2
strongly universal hash family (1)
Hash functions are uniform over [M],
Prh←H h(i) = k =
1
M
(2)
Prh←H h(i) = h(j) =
1
M
weakly universal hash family (3)
Construction Let p be a prime. For any
a, b ∈ Zp = {0, 1, 2, .., p − 1}, define ha,b : Zp → Zp by
ha,b(x) = ax + b modp. Then the collection of functions
H = {ha,b|a, b ∈ Zp} is a pairwise independent hash family.
Counting Distinct Items
Algorithm 1 [a, , δ]
= /2
for t = 1, (1 + ) , (1 + )2 , ... (1 + )log1+ n
do
δ = δ
log n {Run in parallel}
bt = ESTIMATE(a, t, , δ ) {bt is a boolean variable YES/
NO}
end for
return the smallest value of t such that bt−1 =YES and bt =NO,
if no such t exists, return n
Counting Distinct Items
Algorithm 2 [ESTIMATE(a, t, , δ )]
count ← 0
for i = 1 to c
2 log 1
δ do
Select a hash function hi uniformly and randomly from a fully-
independent hash family H {run in parallel}
bi
t ← NO
repeat
Consider the current element in the stream a, say al = (j, ν)
if hi (j) == 1 then
bi
t ← YES, BREAK
end if
until a is exhausted
if bi
t == NO then
count = count + 1
end if
end for
Counting Distinct Items
Algorithm 3 [ESTIMATE(a, t, , δ )]continued
if count ≥ 1
e
c
2 log 1
δ then
return NO
else
return YES
end if
Space Complexity: O( 1
3 log n(log 1
δ + log log n + log 1
))
Time Complexity: O( 1
3 log n(log 1
δ + log log n + log 1
))
Counting Distinct Items
Lemma
Consider the ith round of ESTIMATE(a, t, , δ ) for any
i ∈ [C
2 log 1
δ ]
If DE > (1 + )t then Pr bi
t == NO ≤ 1
e − 2e .
If DE < (1 − )t then Pr bi
t == NO ≥ 1
e + 2e .
Lemma
If DE > (1 + )t then Pr bt == NO ≤ δ
2 .
If DE < (1 − )t then Pr bt == YES ≤ δ
2 .
Lemma
If |DE − t| > t then Pr ERROR ≤ δ .
Counting Distinct Items
Lemma
For all t such that |DE − t| > t Pr ERROR ≤ δ.
Theorem
Algorithm 1 returns an estimate of DE within (1 ± ) with
probability ≥ (1 − δ).
AMS Sketch for Counting Distinct Element
Uses pair-wise independent hash function
Improved space and time complexity
Worse approximation
Algorithm 4 AMS Counting Distinct Items
Initialize
z ← 0
End Initialize
Process(al = (j, ν))
if zeros(h(j)) > z then
z ← zeros(h(j))
end if
End Process
Estimate
return 2z+1
2
End Estimate
AMS Sketch for Counting Distinct Element
Define Xr
j = 1 if zeros(h(j)) ≥ r and 0 otherwise. Define
Y r = j Xr
j .
Lemma
E Xr
j = 1
2r
E Y r
= DE
2r
Var Y r
≤ 1
2r
Lemma
Consider the largest level a such that 2a+ 1
2 < DE
3 .
Pr z ≤ a <
√
2
3 .
Consider the smallest level b such that 2b+ 1
2 > 3DE.
Pr z ≥ b <
√
2
3 .
Pr DE
3 < 2z+ 1
2 < 3DE ≥ 1 − 2
√
2
3 .
AMS Sketch for Counting Distinct Element
Boosting the confidence Median Trick.
Keep C log 1
δ copies and return the median estimate
Theorem
There exists a randomized algorithm that returns an estimate of
DE satisfying Pr DE
3 < 2z+ 1
2 < 3DE ≥ 1 − δ using space
O(log 1
δ log n)

Contenu connexe

Tendances

Integration of Trigonometric Functions
Integration of Trigonometric FunctionsIntegration of Trigonometric Functions
Integration of Trigonometric FunctionsRaymundo Raymund
 
Lagrange's method
Lagrange's methodLagrange's method
Lagrange's methodKarnav Rana
 
Application of analytic function
Application of analytic functionApplication of analytic function
Application of analytic functionDr. Nirav Vyas
 
Advanced matlab codigos matematicos
Advanced matlab codigos matematicosAdvanced matlab codigos matematicos
Advanced matlab codigos matematicosKmilo Bolaños
 
Chebyshev Polynomial Based Numerical Inverse Laplace Transform Solutions of L...
Chebyshev Polynomial Based Numerical Inverse Laplace Transform Solutions of L...Chebyshev Polynomial Based Numerical Inverse Laplace Transform Solutions of L...
Chebyshev Polynomial Based Numerical Inverse Laplace Transform Solutions of L...arj_online
 
Inner outer and spectral factorizations
Inner outer and spectral factorizationsInner outer and spectral factorizations
Inner outer and spectral factorizationsSolo Hermelin
 
Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers Istiak Ahmed
 
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-I
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-IEngineering Mathematics-IV_B.Tech_Semester-IV_Unit-I
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-IRai University
 

Tendances (20)

Chemistry Assignment Help
Chemistry Assignment Help Chemistry Assignment Help
Chemistry Assignment Help
 
勾配法
勾配法勾配法
勾配法
 
Integration of Trigonometric Functions
Integration of Trigonometric FunctionsIntegration of Trigonometric Functions
Integration of Trigonometric Functions
 
Lagrange's method
Lagrange's methodLagrange's method
Lagrange's method
 
Math
MathMath
Math
 
Application of analytic function
Application of analytic functionApplication of analytic function
Application of analytic function
 
Prime numbers
Prime numbersPrime numbers
Prime numbers
 
Lecture5
Lecture5Lecture5
Lecture5
 
QMC: Operator Splitting Workshop, Boundedness of the Sequence if Iterates Gen...
QMC: Operator Splitting Workshop, Boundedness of the Sequence if Iterates Gen...QMC: Operator Splitting Workshop, Boundedness of the Sequence if Iterates Gen...
QMC: Operator Splitting Workshop, Boundedness of the Sequence if Iterates Gen...
 
Divide and Conquer
Divide and ConquerDivide and Conquer
Divide and Conquer
 
Advanced matlab codigos matematicos
Advanced matlab codigos matematicosAdvanced matlab codigos matematicos
Advanced matlab codigos matematicos
 
Chebyshev Polynomial Based Numerical Inverse Laplace Transform Solutions of L...
Chebyshev Polynomial Based Numerical Inverse Laplace Transform Solutions of L...Chebyshev Polynomial Based Numerical Inverse Laplace Transform Solutions of L...
Chebyshev Polynomial Based Numerical Inverse Laplace Transform Solutions of L...
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Daa chapter5
Daa chapter5Daa chapter5
Daa chapter5
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquer
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
Inner outer and spectral factorizations
Inner outer and spectral factorizationsInner outer and spectral factorizations
Inner outer and spectral factorizations
 
Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers
 
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-I
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-IEngineering Mathematics-IV_B.Tech_Semester-IV_Unit-I
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-I
 
Lec23
Lec23Lec23
Lec23
 

En vedette

Biermann AstroPhysic
Biermann AstroPhysicBiermann AstroPhysic
Biermann AstroPhysicAtner Yegorov
 
Prezentazia ryazanskiy
Prezentazia ryazanskiyPrezentazia ryazanskiy
Prezentazia ryazanskiyAtner Yegorov
 
программа жрс презентация 16 02 14
программа жрс   презентация 16 02 14программа жрс   презентация 16 02 14
программа жрс презентация 16 02 14Atner Yegorov
 
!Web index report 201404 all rus
!Web index report 201404 all rus!Web index report 201404 all rus
!Web index report 201404 all rusAtner Yegorov
 
Указ Президента Чувашии от 10 октября 2007 г. №87
Указ Президента Чувашии от 10 октября 2007 г. №87Указ Президента Чувашии от 10 октября 2007 г. №87
Указ Президента Чувашии от 10 октября 2007 г. №87Atner Yegorov
 
Prezentazia nikola lenivez
Prezentazia nikola lenivezPrezentazia nikola lenivez
Prezentazia nikola lenivezAtner Yegorov
 
презентация 2013 для пудовой и.н.
презентация 2013 для пудовой и.н.презентация 2013 для пудовой и.н.
презентация 2013 для пудовой и.н.Atner Yegorov
 
гзн васильева т.в
гзн васильева т.вгзн васильева т.в
гзн васильева т.вAtner Yegorov
 
Prezentazia zadonchina
Prezentazia zadonchinaPrezentazia zadonchina
Prezentazia zadonchinaAtner Yegorov
 
лизин (презентация для тпп) январь 2012
лизин (презентация для тпп) январь 2012лизин (презентация для тпп) январь 2012
лизин (презентация для тпп) январь 2012Atner Yegorov
 
О выполнении работ на автодорогах Чувашии
О выполнении работ на автодорогах ЧувашииО выполнении работ на автодорогах Чувашии
О выполнении работ на автодорогах ЧувашииAtner Yegorov
 

En vedette (18)

Lec 5-nn-slides
Lec 5-nn-slidesLec 5-nn-slides
Lec 5-nn-slides
 
Biermann AstroPhysic
Biermann AstroPhysicBiermann AstroPhysic
Biermann AstroPhysic
 
Biermann feedback
Biermann feedbackBiermann feedback
Biermann feedback
 
Prezentazia ryazanskiy
Prezentazia ryazanskiyPrezentazia ryazanskiy
Prezentazia ryazanskiy
 
программа жрс презентация 16 02 14
программа жрс   презентация 16 02 14программа жрс   презентация 16 02 14
программа жрс презентация 16 02 14
 
!Web index report 201404 all rus
!Web index report 201404 all rus!Web index report 201404 all rus
!Web index report 201404 all rus
 
Biermann clusters
Biermann clustersBiermann clusters
Biermann clusters
 
Указ Президента Чувашии от 10 октября 2007 г. №87
Указ Президента Чувашии от 10 октября 2007 г. №87Указ Президента Чувашии от 10 октября 2007 г. №87
Указ Президента Чувашии от 10 октября 2007 г. №87
 
Prezentazia nikola lenivez
Prezentazia nikola lenivezPrezentazia nikola lenivez
Prezentazia nikola lenivez
 
Itogi2012
Itogi2012Itogi2012
Itogi2012
 
Itogi pr
Itogi prItogi pr
Itogi pr
 
презентация 2013 для пудовой и.н.
презентация 2013 для пудовой и.н.презентация 2013 для пудовой и.н.
презентация 2013 для пудовой и.н.
 
гзн васильева т.в
гзн васильева т.вгзн васильева т.в
гзн васильева т.в
 
химпром
химпромхимпром
химпром
 
Prezentazia zadonchina
Prezentazia zadonchinaPrezentazia zadonchina
Prezentazia zadonchina
 
лизин (презентация для тпп) январь 2012
лизин (презентация для тпп) январь 2012лизин (презентация для тпп) январь 2012
лизин (презентация для тпп) январь 2012
 
О выполнении работ на автодорогах Чувашии
О выполнении работ на автодорогах ЧувашииО выполнении работ на автодорогах Чувашии
О выполнении работ на автодорогах Чувашии
 
технотрон
технотронтехнотрон
технотрон
 

Similaire à Lec2 slides

Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsPK Lehre
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsPer Kristian Lehre
 
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALESNONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALESTahia ZERIZER
 
Roots equations
Roots equationsRoots equations
Roots equationsoscar
 
Roots equations
Roots equationsRoots equations
Roots equationsoscar
 
Conjugate Gradient Methods
Conjugate Gradient MethodsConjugate Gradient Methods
Conjugate Gradient MethodsMTiti1
 
Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Nikita V. Artamonov
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeVjekoslavKovac1
 
Griffiths2,12 2,14 2,26 2.38
Griffiths2,12 2,14 2,26 2.38Griffiths2,12 2,14 2,26 2.38
Griffiths2,12 2,14 2,26 2.38rose156
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning Sean Meyn
 
Partial Fraction ppt presentation of engg math
Partial Fraction ppt presentation of engg mathPartial Fraction ppt presentation of engg math
Partial Fraction ppt presentation of engg mathjaidevpaulmca
 
Comparison Theorems for SDEs
Comparison Theorems for SDEs Comparison Theorems for SDEs
Comparison Theorems for SDEs Ilya Gikhman
 

Similaire à Lec2 slides (20)

Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
stochastic processes assignment help
stochastic processes assignment helpstochastic processes assignment help
stochastic processes assignment help
 
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALESNONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
 
Roots equations
Roots equationsRoots equations
Roots equations
 
Roots equations
Roots equationsRoots equations
Roots equations
 
Conjugate Gradient Methods
Conjugate Gradient MethodsConjugate Gradient Methods
Conjugate Gradient Methods
 
Ch05 2
Ch05 2Ch05 2
Ch05 2
 
Ch07 7
Ch07 7Ch07 7
Ch07 7
 
Statistical Method In Economics
Statistical Method In EconomicsStatistical Method In Economics
Statistical Method In Economics
 
Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cube
 
Griffiths2,12 2,14 2,26 2.38
Griffiths2,12 2,14 2,26 2.38Griffiths2,12 2,14 2,26 2.38
Griffiths2,12 2,14 2,26 2.38
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
Improper integral
Improper integralImproper integral
Improper integral
 
Partial Fraction ppt presentation of engg math
Partial Fraction ppt presentation of engg mathPartial Fraction ppt presentation of engg math
Partial Fraction ppt presentation of engg math
 
Comparison Theorems for SDEs
Comparison Theorems for SDEs Comparison Theorems for SDEs
Comparison Theorems for SDEs
 

Plus de Atner Yegorov

Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Atner Yegorov
 
Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Atner Yegorov
 
Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Atner Yegorov
 
Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Atner Yegorov
 
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015Atner Yegorov
 
Брендинг Чувашии
Брендинг ЧувашииБрендинг Чувашии
Брендинг ЧувашииAtner Yegorov
 
VIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыVIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыAtner Yegorov
 
Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Atner Yegorov
 
Аттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанАттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанAtner Yegorov
 
Тематический парк, Татарстан
Тематический парк, ТатарстанТематический парк, Татарстан
Тематический парк, ТатарстанAtner Yegorov
 
Будущее журналистики
Будущее журналистикиБудущее журналистики
Будущее журналистикиAtner Yegorov
 
Будущее нишевых СМИ
Будущее нишевых СМИБудущее нишевых СМИ
Будущее нишевых СМИAtner Yegorov
 
Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Atner Yegorov
 
Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Atner Yegorov
 
Национальная технологическая инициатива
Национальная технологическая инициативаНациональная технологическая инициатива
Национальная технологическая инициативаAtner Yegorov
 
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ Atner Yegorov
 
Участники заседания по модернизации
Участники заседания по модернизацииУчастники заседания по модернизации
Участники заседания по модернизацииAtner Yegorov
 
Город Innopolis
Город InnopolisГород Innopolis
Город InnopolisAtner Yegorov
 
презентация1
презентация1презентация1
презентация1Atner Yegorov
 

Plus de Atner Yegorov (20)

Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015
 
Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015
 
Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015
 
Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015
 
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
 
Брендинг Чувашии
Брендинг ЧувашииБрендинг Чувашии
Брендинг Чувашии
 
VIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыVIII Экономический форум Чебоксары
VIII Экономический форум Чебоксары
 
Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015
 
Аттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанАттракционы тематического парка, Татарстан
Аттракционы тематического парка, Татарстан
 
Тематический парк, Татарстан
Тематический парк, ТатарстанТематический парк, Татарстан
Тематический парк, Татарстан
 
Будущее журналистики
Будущее журналистикиБудущее журналистики
Будущее журналистики
 
Будущее нишевых СМИ
Будущее нишевых СМИБудущее нишевых СМИ
Будущее нишевых СМИ
 
Relap
RelapRelap
Relap
 
Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах
 
Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Война форматов. Как люди читают новости
Война форматов. Как люди читают новости
 
Национальная технологическая инициатива
Национальная технологическая инициативаНациональная технологическая инициатива
Национальная технологическая инициатива
 
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
 
Участники заседания по модернизации
Участники заседания по модернизацииУчастники заседания по модернизации
Участники заседания по модернизации
 
Город Innopolis
Город InnopolisГород Innopolis
Город Innopolis
 
презентация1
презентация1презентация1
презентация1
 

Dernier

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Dernier (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Lec2 slides

  • 1. Lecture 2 Barna Saha AT&T-Labs Research September 12, 2013
  • 2. Outline Concentration Inequalities Revisited Universal Family of Hash Functions Counting Distinct Items Analysis of Algorithm from Lecture 0 AMS Algorithm for Counting Distinct Element
  • 3. First and Second Moment Bounds Markov Inequality For any positive random variable X and t > 0 Pr X > t ≤ E X t Chebyshev Inequality For any random variable X and t > 0 Pr |X − E X | > t ≤ Var X t2
  • 4. The Chernoff Bound Let X1, X2...Xn be n independent Bernoulli random variables with Pr(Xi = 1) = pi . Let X = Xi . Hence, E[X] = E Xi = E [Xi ] = Pr(Xi = 1) = pi = µ(say). Then the Chernoff Bound says for any > 0 Pr(X > (1 + )µ) ≤ e (1 + ) µ and Pr(X < (1 − )µ) ≤ e− (1 − )1− µ When 0 < < 1 the above expression can be further simplified to Pr(X > (1 + )µ) ≤ e −µ 2 3 and Pr(X < (1 − )µ) ≤ e −µ 2 2 Hence Pr(|X − µ| > µ) ≤ 2e −µ 2 3
  • 5. Universal Hash Family A family of hash functions H = {h |h : [N] − − > [M]} is called a pairwise independent family of hash functions if for all i = j ∈ [N] and any k, l ∈ [M] Prh←H h(i) = k ∧ h(j) = l = 1 M2 strongly universal hash family (1) Hash functions are uniform over [M], Prh←H h(i) = k = 1 M (2) Prh←H h(i) = h(j) = 1 M weakly universal hash family (3) Construction Let p be a prime. For any a, b ∈ Zp = {0, 1, 2, .., p − 1}, define ha,b : Zp → Zp by ha,b(x) = ax + b modp. Then the collection of functions H = {ha,b|a, b ∈ Zp} is a pairwise independent hash family.
  • 6. Counting Distinct Items Algorithm 1 [a, , δ] = /2 for t = 1, (1 + ) , (1 + )2 , ... (1 + )log1+ n do δ = δ log n {Run in parallel} bt = ESTIMATE(a, t, , δ ) {bt is a boolean variable YES/ NO} end for return the smallest value of t such that bt−1 =YES and bt =NO, if no such t exists, return n
  • 7. Counting Distinct Items Algorithm 2 [ESTIMATE(a, t, , δ )] count ← 0 for i = 1 to c 2 log 1 δ do Select a hash function hi uniformly and randomly from a fully- independent hash family H {run in parallel} bi t ← NO repeat Consider the current element in the stream a, say al = (j, ν) if hi (j) == 1 then bi t ← YES, BREAK end if until a is exhausted if bi t == NO then count = count + 1 end if end for
  • 8. Counting Distinct Items Algorithm 3 [ESTIMATE(a, t, , δ )]continued if count ≥ 1 e c 2 log 1 δ then return NO else return YES end if Space Complexity: O( 1 3 log n(log 1 δ + log log n + log 1 )) Time Complexity: O( 1 3 log n(log 1 δ + log log n + log 1 ))
  • 9. Counting Distinct Items Lemma Consider the ith round of ESTIMATE(a, t, , δ ) for any i ∈ [C 2 log 1 δ ] If DE > (1 + )t then Pr bi t == NO ≤ 1 e − 2e . If DE < (1 − )t then Pr bi t == NO ≥ 1 e + 2e . Lemma If DE > (1 + )t then Pr bt == NO ≤ δ 2 . If DE < (1 − )t then Pr bt == YES ≤ δ 2 . Lemma If |DE − t| > t then Pr ERROR ≤ δ .
  • 10. Counting Distinct Items Lemma For all t such that |DE − t| > t Pr ERROR ≤ δ. Theorem Algorithm 1 returns an estimate of DE within (1 ± ) with probability ≥ (1 − δ).
  • 11. AMS Sketch for Counting Distinct Element Uses pair-wise independent hash function Improved space and time complexity Worse approximation Algorithm 4 AMS Counting Distinct Items Initialize z ← 0 End Initialize Process(al = (j, ν)) if zeros(h(j)) > z then z ← zeros(h(j)) end if End Process Estimate return 2z+1 2 End Estimate
  • 12. AMS Sketch for Counting Distinct Element Define Xr j = 1 if zeros(h(j)) ≥ r and 0 otherwise. Define Y r = j Xr j . Lemma E Xr j = 1 2r E Y r = DE 2r Var Y r ≤ 1 2r Lemma Consider the largest level a such that 2a+ 1 2 < DE 3 . Pr z ≤ a < √ 2 3 . Consider the smallest level b such that 2b+ 1 2 > 3DE. Pr z ≥ b < √ 2 3 . Pr DE 3 < 2z+ 1 2 < 3DE ≥ 1 − 2 √ 2 3 .
  • 13. AMS Sketch for Counting Distinct Element Boosting the confidence Median Trick. Keep C log 1 δ copies and return the median estimate Theorem There exists a randomized algorithm that returns an estimate of DE satisfying Pr DE 3 < 2z+ 1 2 < 3DE ≥ 1 − δ using space O(log 1 δ log n)