SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
for duplicate detection on real time stream
whoami(1) 
15 years of experience, proud to be a programmer 
Writes software for information extraction, nlp, opinion mining (@scale ), and a 
lot of other buzzwords 
Implements scalable architectures 
Member of the JUG-Torino coordination team 
ro.franchini@gmail.com github.com/robfrank 
twitter.com/robfrankie linkedin.com/in/robfrank 
http://www.celi.it http://www.blogmeter.it
Agenda 
What is it? 
Main features 
Caching 
Counters 
Scripting 
How we use it
From the site 
Redis is an open source, BSD licensed, advanced 
key-value cache and store. It is often referred to 
as a data structure server since keys can contain 
strings, hashes, lists, sets, sorted sets, bitmaps 
and hyperloglogs.
Who use it 
Twitter 
Github 
Youporn 
Pinterest 
Groupon 
...
Ecosystem 
Clients in every known language 
Articles, books, presentations 
On High Scalability every other day
Architecture 
Single-threaded server 
Yes: single threaded server 
Remember that when you need to scale 
Single Linux server can handle 500k req/s
Main features 
In memory K/V store 
But with durable persistence 
Master-slave async replica 
Transactions 
Pub/Sub 
Server side LUA scripting
Main features 
Keys with TTL 
LRU eviction 
Keys can contain strings, hashes, lists, sets, sorted sets, 
bitmaps and hyperloglogs 
REDIS cluster on the go (3.0.0-rc1)
K/V store 
Key-value (KV) stores use the associative array (also 
known as a map or dictionary) as their fundamental data 
model. In this model, data is represented as a collection of 
key-value pairs, such that each possible key appears at 
most once in the collection. (wikipedia)
K/V store 
Key 
“plain text” 
name rob 
surname frank 
A C E D B F 
A B C D E F 
String/blobs/bitmaps 
HashTable: Objects 
Linked lists 
Sets
Persistence 
Configurable, two flavors 
RDB: perfect for backup 
AOF: append only log, replayed at startup 
Use AOF + RDB for rock solid persistence 
Automatic cache warm-up at startup!! 
Only RAM: switch off persistence
Common use cases 
Cache 
Queue 
Session replication 
In memory indexes 
Centralized ID generation
Basics 
SET user:1 frank 
GET user:1 → frank 
EXISTS user:2 → 1 
EXPIRE user:1 3600 
INCR count:1 
GET count:1 → 1
Basics 
KEYS user:* → user:1, user:2 
MSET user:1 frank user:2 coder 
MGET user:1 user:2 → frank, coder 
HMSET userdetail:3 name rob surname frank 
HGETALL userdetail:3 → name::rob, surname:: frank
Transactions 
MULTI 
INCR counter:1 
INCR counter:2 
EXEC 
> 1 
> 1 
WATCH counter:3 
val = GET counter:3 
val = val +1 
MULTI 
SET counter:3 $val 
EXEC
Atomic counters 
Operators for key increment 
INCR counter:1 
GET counter:1 → 1 
INCRBY counter:1 9 
GET counter:1 → 10
LUA scripting 
Server side LUA scripting 
A “sort of” stored procedure 
Scripts are sandboxed 
Atomic execution ← bear in mind
LUA scripting 
SCRIPT LOAD "return {KEYS[1],KEYS[2]}" 
"3905aac1828a8f75707b48e446988eaaeb173f13" 
EVALSHA 
3905aac1828a8f75707b48e446988eaaeb173f13 2 user:1 
user:2 
1) "user:1" 
2) "user:2"
Caching: server level 
Configure REDIS as a cache 
maxmemory 1024mb 
maxmemory-policy allkeys-lru 
all the keys will be evicted using an 
approximated LRU algorithm
Caching: TTL on key 
Set a timeout on a key 
SET doc:1 “mydoc.txt” 
EXIPRE doc:1 10 
Or 
SETEX doc:1 10 “mydoc.txt”
Demo
Caching 
+ 
Atomic Counters 
+ 
Atomic LUA scripting
Duplicate detection 
Real time stream of documents from 
the Internet 
20% to 50% of documents are duplicated 
DUPLICATES ARE EVIL 
And customers don’t pay for that :(
Basic Scenario 
5M 3M 3M 
Duplicates 
Producer 
Producer detector NLP Storage 
Producer
Avoid duplicated documents 
Act on producers was 
TOO HARD 
Filter-out them before heavy document analysis (NLP)
Documents 
“Documents” are from: 
twitter 
facebook 
gplus 
instagram 
forums 
blogs
Documents 
Each kind of document has its own natural id 
twitter: status id 
facebook: post id 
forum: URL 
blog: URL 
We don’t want this IDs inside our system
Duplicate and id generation 
Producer 
2M 
Producer 
Producer 
Duplicate 
detector - 
ID 
generatio 
n 
Analysis 
Storage 
3M 
3M 
Duplicate 
detector - 
ID 
generatio 
n 
1M Analysis 1M 
5M
Map external keys to internal UID 
Generate an ID for each document 
IDs are generated using daily named counters: 
INCR day:20141028 → 12576 
INCR day:20141010 → 23412576 
Cache generated ID 
tw_1234578688 → day:20141028;12576
Map external keys to internal UID 
Documents are internally stored on different storage 
systems with their generated id 
globalId→ 20141028:3456789
Operations 
Natural Keys are cached with TTL 
Documents out of time are parked in a staging area 
Duplicated documents are usually dropped
LRU cache, counters and LUA 
LUA scripts are executed atomically 
Wrote a simple script to: 
return previous mapped id 
or generate id and store key and id in cache 
EVALSHA “sha” 2 20141028 tw_1234566 → 20141028:123 
GET tw_1234566 → 20141028:123
Demo
Deployment 
Pre-production phase 
Single server 
70M keys in 10GB of RAM 
In production with a simple M/S
Alternatives 
PostgreSQL 
sequence(s) 
table OR hstore 
Hazelcast (we are java based) 
in memory 
write your own persistence
Q/A
References 
http://redis.io/ 
http://redis.io/commands 
http://stackoverflow.com/questions/tagged/redis 
http://try.redis.io/

Contenu connexe

Tendances

[A23] Oracle移行を簡単に。レプリケーションテクノロジーを使いこなす by Keishi Miyachi
[A23] Oracle移行を簡単に。レプリケーションテクノロジーを使いこなす by Keishi Miyachi[A23] Oracle移行を簡単に。レプリケーションテクノロジーを使いこなす by Keishi Miyachi
[A23] Oracle移行を簡単に。レプリケーションテクノロジーを使いこなす by Keishi Miyachi
Insight Technology, Inc.
 

Tendances (20)

Data base testing
Data base testingData base testing
Data base testing
 
Comparison between Oracle JDK, Oracle OpenJDK, and Red Hat OpenJDK
Comparison between Oracle JDK, Oracle OpenJDK, and Red Hat OpenJDKComparison between Oracle JDK, Oracle OpenJDK, and Red Hat OpenJDK
Comparison between Oracle JDK, Oracle OpenJDK, and Red Hat OpenJDK
 
Oracle architecture
Oracle architectureOracle architecture
Oracle architecture
 
Oracle SQL Basics
Oracle SQL BasicsOracle SQL Basics
Oracle SQL Basics
 
Oracle 10g Installation
Oracle 10g InstallationOracle 10g Installation
Oracle 10g Installation
 
Db2 tutorial
Db2 tutorialDb2 tutorial
Db2 tutorial
 
Sap abap tutorials
Sap abap tutorialsSap abap tutorials
Sap abap tutorials
 
ALL ABOUT DB2 DSNZPARM
ALL ABOUT DB2 DSNZPARMALL ABOUT DB2 DSNZPARM
ALL ABOUT DB2 DSNZPARM
 
Rúbrica idea de proyecto Proyecta eTwinning
Rúbrica idea de proyecto Proyecta eTwinningRúbrica idea de proyecto Proyecta eTwinning
Rúbrica idea de proyecto Proyecta eTwinning
 
Concurrent Processing Performance Analysis for Apps DBAs
Concurrent Processing Performance Analysis for Apps DBAsConcurrent Processing Performance Analysis for Apps DBAs
Concurrent Processing Performance Analysis for Apps DBAs
 
Sap abap material
Sap abap materialSap abap material
Sap abap material
 
[A23] Oracle移行を簡単に。レプリケーションテクノロジーを使いこなす by Keishi Miyachi
[A23] Oracle移行を簡単に。レプリケーションテクノロジーを使いこなす by Keishi Miyachi[A23] Oracle移行を簡単に。レプリケーションテクノロジーを使いこなす by Keishi Miyachi
[A23] Oracle移行を簡単に。レプリケーションテクノロジーを使いこなす by Keishi Miyachi
 
pl/sql Procedure
pl/sql Procedurepl/sql Procedure
pl/sql Procedure
 
Convert single instance to RAC
Convert single instance to RACConvert single instance to RAC
Convert single instance to RAC
 
Manajemen file
Manajemen fileManajemen file
Manajemen file
 
Semi join
Semi joinSemi join
Semi join
 
Indexes and Indexing in Oracle 12c
Indexes and Indexing in Oracle 12cIndexes and Indexing in Oracle 12c
Indexes and Indexing in Oracle 12c
 
Oracle sql high performance tuning
Oracle sql high performance tuningOracle sql high performance tuning
Oracle sql high performance tuning
 
Stored procedures
Stored proceduresStored procedures
Stored procedures
 
Oracle architecture ppt
Oracle architecture pptOracle architecture ppt
Oracle architecture ppt
 

En vedette

PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
 

En vedette (6)

Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
 
Scalable Streaming Data Pipelines with Redis
Scalable Streaming Data Pipelines with RedisScalable Streaming Data Pipelines with Redis
Scalable Streaming Data Pipelines with Redis
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profit
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
 

Similaire à Redis - for duplicate detection on real time stream

Varnish http accelerator
Varnish http acceleratorVarnish http accelerator
Varnish http accelerator
no no
 
Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9
Jérôme Petazzoni
 

Similaire à Redis - for duplicate detection on real time stream (20)

Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
Swift Install Workshop - OpenStack Conference Spring 2012
Swift Install Workshop - OpenStack Conference Spring 2012Swift Install Workshop - OpenStack Conference Spring 2012
Swift Install Workshop - OpenStack Conference Spring 2012
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.key
 
Varnish http accelerator
Varnish http acceleratorVarnish http accelerator
Varnish http accelerator
 
Speed up your Symfony2 application and build awesome features with Redis
Speed up your Symfony2 application and build awesome features with RedisSpeed up your Symfony2 application and build awesome features with Redis
Speed up your Symfony2 application and build awesome features with Redis
 
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
 
Introduction to Docker (and a bit more) at LSPE meetup Sunnyvale
Introduction to Docker (and a bit more) at LSPE meetup SunnyvaleIntroduction to Docker (and a bit more) at LSPE meetup Sunnyvale
Introduction to Docker (and a bit more) at LSPE meetup Sunnyvale
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with Elasticsearch
 
Advances in Open Source Password Cracking
Advances in Open Source Password CrackingAdvances in Open Source Password Cracking
Advances in Open Source Password Cracking
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and docker
 
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQDocker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
 
Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
 
dba_lounge_Iasi: Everybody likes redis
dba_lounge_Iasi: Everybody likes redisdba_lounge_Iasi: Everybody likes redis
dba_lounge_Iasi: Everybody likes redis
 
Privilege Escalation with Metasploit
Privilege Escalation with MetasploitPrivilege Escalation with Metasploit
Privilege Escalation with Metasploit
 
FPC for the Masses - CoRIIN 2018
FPC for the Masses - CoRIIN 2018FPC for the Masses - CoRIIN 2018
FPC for the Masses - CoRIIN 2018
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 

Plus de Codemotion

Plus de Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Dernier

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Dernier (20)

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

Redis - for duplicate detection on real time stream

  • 1. for duplicate detection on real time stream
  • 2. whoami(1) 15 years of experience, proud to be a programmer Writes software for information extraction, nlp, opinion mining (@scale ), and a lot of other buzzwords Implements scalable architectures Member of the JUG-Torino coordination team ro.franchini@gmail.com github.com/robfrank twitter.com/robfrankie linkedin.com/in/robfrank http://www.celi.it http://www.blogmeter.it
  • 3. Agenda What is it? Main features Caching Counters Scripting How we use it
  • 4. From the site Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs.
  • 5. Who use it Twitter Github Youporn Pinterest Groupon ...
  • 6. Ecosystem Clients in every known language Articles, books, presentations On High Scalability every other day
  • 7. Architecture Single-threaded server Yes: single threaded server Remember that when you need to scale Single Linux server can handle 500k req/s
  • 8. Main features In memory K/V store But with durable persistence Master-slave async replica Transactions Pub/Sub Server side LUA scripting
  • 9. Main features Keys with TTL LRU eviction Keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs REDIS cluster on the go (3.0.0-rc1)
  • 10. K/V store Key-value (KV) stores use the associative array (also known as a map or dictionary) as their fundamental data model. In this model, data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection. (wikipedia)
  • 11. K/V store Key “plain text” name rob surname frank A C E D B F A B C D E F String/blobs/bitmaps HashTable: Objects Linked lists Sets
  • 12. Persistence Configurable, two flavors RDB: perfect for backup AOF: append only log, replayed at startup Use AOF + RDB for rock solid persistence Automatic cache warm-up at startup!! Only RAM: switch off persistence
  • 13. Common use cases Cache Queue Session replication In memory indexes Centralized ID generation
  • 14. Basics SET user:1 frank GET user:1 → frank EXISTS user:2 → 1 EXPIRE user:1 3600 INCR count:1 GET count:1 → 1
  • 15. Basics KEYS user:* → user:1, user:2 MSET user:1 frank user:2 coder MGET user:1 user:2 → frank, coder HMSET userdetail:3 name rob surname frank HGETALL userdetail:3 → name::rob, surname:: frank
  • 16. Transactions MULTI INCR counter:1 INCR counter:2 EXEC > 1 > 1 WATCH counter:3 val = GET counter:3 val = val +1 MULTI SET counter:3 $val EXEC
  • 17. Atomic counters Operators for key increment INCR counter:1 GET counter:1 → 1 INCRBY counter:1 9 GET counter:1 → 10
  • 18. LUA scripting Server side LUA scripting A “sort of” stored procedure Scripts are sandboxed Atomic execution ← bear in mind
  • 19. LUA scripting SCRIPT LOAD "return {KEYS[1],KEYS[2]}" "3905aac1828a8f75707b48e446988eaaeb173f13" EVALSHA 3905aac1828a8f75707b48e446988eaaeb173f13 2 user:1 user:2 1) "user:1" 2) "user:2"
  • 20. Caching: server level Configure REDIS as a cache maxmemory 1024mb maxmemory-policy allkeys-lru all the keys will be evicted using an approximated LRU algorithm
  • 21. Caching: TTL on key Set a timeout on a key SET doc:1 “mydoc.txt” EXIPRE doc:1 10 Or SETEX doc:1 10 “mydoc.txt”
  • 22. Demo
  • 23. Caching + Atomic Counters + Atomic LUA scripting
  • 24. Duplicate detection Real time stream of documents from the Internet 20% to 50% of documents are duplicated DUPLICATES ARE EVIL And customers don’t pay for that :(
  • 25. Basic Scenario 5M 3M 3M Duplicates Producer Producer detector NLP Storage Producer
  • 26. Avoid duplicated documents Act on producers was TOO HARD Filter-out them before heavy document analysis (NLP)
  • 27. Documents “Documents” are from: twitter facebook gplus instagram forums blogs
  • 28. Documents Each kind of document has its own natural id twitter: status id facebook: post id forum: URL blog: URL We don’t want this IDs inside our system
  • 29. Duplicate and id generation Producer 2M Producer Producer Duplicate detector - ID generatio n Analysis Storage 3M 3M Duplicate detector - ID generatio n 1M Analysis 1M 5M
  • 30. Map external keys to internal UID Generate an ID for each document IDs are generated using daily named counters: INCR day:20141028 → 12576 INCR day:20141010 → 23412576 Cache generated ID tw_1234578688 → day:20141028;12576
  • 31. Map external keys to internal UID Documents are internally stored on different storage systems with their generated id globalId→ 20141028:3456789
  • 32. Operations Natural Keys are cached with TTL Documents out of time are parked in a staging area Duplicated documents are usually dropped
  • 33. LRU cache, counters and LUA LUA scripts are executed atomically Wrote a simple script to: return previous mapped id or generate id and store key and id in cache EVALSHA “sha” 2 20141028 tw_1234566 → 20141028:123 GET tw_1234566 → 20141028:123
  • 34. Demo
  • 35. Deployment Pre-production phase Single server 70M keys in 10GB of RAM In production with a simple M/S
  • 36. Alternatives PostgreSQL sequence(s) table OR hstore Hazelcast (we are java based) in memory write your own persistence
  • 37. Q/A
  • 38. References http://redis.io/ http://redis.io/commands http://stackoverflow.com/questions/tagged/redis http://try.redis.io/