Evaluation of caching strategies based on access statistics

Evaluation of Caching Strategies
based on Access Statistics on Past Requests
Gerhard Haßlinger, Konstantinos Ntougias
gerhard.hasslinger@telekom.de; kostas_ntougias@yahoo.gr

Commercial in Confidence



Least Recently Used (LRU): Simple Standard Method
- Analysis, Simulation: Deficits of LRU Cache Hit Rate



Statistics-based Caching Strategies
- Window: over the last K Requests
- Geometrical Aging: Geom. Decreasing Weight per Request
- Criteria: Hit Rate and Effort for Alternative Strategies



Summary on hit rates and effort of web caching strategies

© 2013 The SmartenIT Consortium

Cache Efficiency for YouTube Video Traces
60%

Cache Hit Rate

50%

Optimal Cache Strategy: Most Popular Data in Cache
Zipf Law Approximation: 0.004*R**(-5/8)
LRU Cache Strategie:
Least Recently Used

40%
30%
20%
10%


0%
0.0078%

0.031%

0.124%

0.5%

2%

Cache Size: Fraction of videos in the cache

Evaluation of 3.7 billion accesses on 1.65 million YouTube
files
Sources: M. Cha et al., I tube, you tube, everybody tubes: Analyzing the world’s largest user
generated content video system, Internet measurement conference IMC, San Diego, USA (2007)
Efficiency of caching for IP-based Content Delivery (G. Haßlinger, O. Hohlfeld, ITC 2010)
Results confirmed by N. Megiddo and S. Modha, Outperforming LRU with an adaptive
replacement cache algorithm, IEEE Computer, (Apr. 2004) 4-11

Cache Strategies incl. Statistics on Past Requests
Sliding Window: Cache holds objects with highest request
frequency over a sliding window of the last K requests



Geometric Fading: Cache holds objects that have the highest
sum of weights for past requests, where the kth request in the
past has a geometrically decreasing weight r k (0 < r < 1).





Statistics over window of the last K requests





Converges to caching of the most popular objects for large K
Reacts to dynamic change in population, after delay until
requests to new item are relevant in the statistics

Implementation:
 The request sequence in the window has to be stored;
for a new request one request is falling out of the window
and has to be removed from statistics
 2 objects change their statistics score per new request:
Updates in cache still have constant effort per request,
although more than for LRU


Statistics with geometrical aging





The k-th request in the past is weighted by ρ k (ρ <1)
The weight of an object is the sum of the weights of request
Objects are ordered according to their weights

Implementation:
 In principle, all weights should be multiplied by ρ for each
request; instead, the new weight can be multiplied by 1/ρ (>1)
i.e. weights are (1/ρ )k for the k-th request
 One object changes rank per request;
Effort for update rank in sorted list: O(ln(M))
Faster approx.: Requested object to step up noly one rank;
or rank updates only e.g. per hour or per day

Basic Assumptions on Cache Modeling & Evaluation
We assume
 a set of N objects and a cache for M (< N) objects of fixed
size
(objects of different size are handled as k unit size chunks;
bin-packing problems are almost irrelevant in large caches)




Random independent requests with static popularity
pk: Request Probability to object k in the order of popularity

⇒ Optimum strategy holds the most popular objects in cache

Static popularity is favourable for the cache hit rate, since
unforeseen changes in popularity detract from cache
efficiency




Measurement traces of request to Youtube show only slowly
varying popularity, a few percent of new top 100 items appear
per day/week


Results on LRU Caching Strategy


An LRU cache is implemented as a stack of dept M;
A new request puts the object on top
LRU is simple and frequently used (Squid, DropBox etc.)



Analysis of the hit rate for static distribution is possible:
pk2
hLRU ( M ) = ∑ pk1 ∑
1 − pk1
k1 =1
k 2 =1


N

N

k 2 ≠ k1



N

∑

k3 =1
k3 ≠ k1 ,k 2

p k3
...
1 − pk1 − pk2

N

∑

k M =1
k M ≠ k1 ,..., k n −1

pkM

1 − ∑ j =1 pk j
M −1

M

∑p
j =1

kj

.

but has complex evaluation feasible only for small size M < 15
Approximations by Towsley et al. (1999), Ha. & Ho. (2010),
Fricker, Robert, Roberts (2011) seem to be good for arbitrary
static request distribution but verified only by simulation


Worst Case Analysis of LRU Caching Strategy


Cache size M =1 with only one popular popularity
p1 >> ε > p2 , … When most popular item is always in cache
⇒ optimum hit rate: p1; LRU hit rate is smaller: p12.



Arbitrary cache size M with a set IPop of M popular objects
p1 = p2 = … =pM = p/M >> ε > pM+1, pM+2, …





pLRU(j, k): probability of j popular items from the set IPop are found
in an LRU cache of size k. We can analyse pLRU(j, k) iteratively:
pLRU ( j , k ) = p ( j , k − 1)

1 − Mp − (k − 1 − j )ε
( M − j + 1) p
+ p( j − 1, k − 1)
.
1 − jp − (k − 1 − j )ε
1 − ( j − 1) p − (k − j )ε

⇒ LRU hit rate hLRU = Σj pLRU(j, M)[ j  p + (M – j)ε ].
LRU
Cache
of
size k

=

XTop

+
Cache of
size k-1

XTop ∈ IPop
Last request to an object X
not in the cache of size k-1

pLRU(j+1, k)

XTop ∉ IPop

pLRU(j, k)

pLRU(j, k-1)

⇒ Exact analysis of LRU worst case hit rate is feasible

Worst Case Analysis of LRU Caching
100%

Most popular items in cache
LRU Worst Case for Cache of Size 1


Cache Hit Rate

80%

60%

28.9% max. absolute deficit →
severe relative deficits for
↓ small cache hit rate

40%

20%

0%
0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

Worst Case LRU Scenario: Probability of a request to the set of popular objects


1

Simulation Results for Caching Strategies
Hit rate of the caching strategies (N = 1000 objects; K = 1000)
for Zipf distributed requests A(R) = α R–β (β = 0.6; α = 2.7%)
40%
Most popular objects in the cache
Geometrical fading
Sliding window
LRU Approximation
LRU Simulation

20%
R
t
i
H
e
h
c
a
C


30%

10%

0%
M=

5


10

20

50

100

Hit rate of the caching strategies (N = 1000; K = 1000)
70%

Optimum
Geometrical fading
Sliding window
LRU Approximation
LRU Simulation

60%

40%
30%
R
t
i
H
e
h
c
a
C


50%

20%
10%
0%
M=

5


10

20

50

100

Hit rate of the caching strategies (N = 1000)
60%
55%

45%
R
t
i
H
e
h
c
a
C


50%

Optimum for i.i.d. requests
Geometrical fading
Sliding window
LRU

40%
35%
K= 1

4

16

64

128

256

512

1024

2048

Sliding Window and Geometrical Fading:
Hit rate depending on the window size K, ρ (ρ = K/(K + 1))

Conclusions on Cache Replacement Strategies






LRU seems most often used in web caches (Squid, DropBox)
For static popularity, LRU is below the maximal hit rate by
- 28.9% in the worst case
- 10-20% for large content sites (YouTube; Zipf-like requests)
LRU performance is poor especially for small caches
Statistics over a fixed size window and geometric aging
can converge to optimum hit rate of the static popularity case



Implementation:
- Statistics over window needs some storage,
has constant update effort per request but more than LRU
- Geometric aging has effort O(ln(M))



Zipf law popularity makes (small) caches efficient


Evaluation of caching strategies based on access statistics

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Evaluation of caching strategies based on access statistics

Similaire à Evaluation of caching strategies based on access statistics (20)

Plus de SmartenIT

Plus de SmartenIT (13)

Dernier

Dernier (20)

Evaluation of caching strategies based on access statistics

Notes de l'éditeur