Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Keeping track of uniques from streaming data with practically no memory! py con 2015 lightning talk
1. Keeping track of uniques
from streaming data with
practically no memory!
or… magic?
2. The simple way... lots of memory
import sys
uniques = set()
for i in range(100000):
uniques.add(i)
len(uniques)
# 100000
sys.getsizeof(uniques)
# 4194536 (bytes)
3. The magical way - with redis
import redis
client = redis.Redis()
for i in range(100000):
client.pfadd(“uniques”, i)
client.pfcount(“uniques”)
# 99556
client.debug_object(“uniques”)[“serializedlength”]
# 10560 (bytes)
4. Eh?
“The basis of the HyperLogLog algorithm is the observation
that the cardinality of a multiset of uniformly-distributed
random numbers can be estimated by calculating the
maximum number of leading zeros in the binary
representation of each number in the set. If the maximum
number of leading zeros observed is n, an estimate for the
number of distinct elements in the set is 2n.”
- Wikipedia