SlideShare une entreprise Scribd logo
Couchbase usage and perfs
Criteo - Couchbase Live 2016 - Paris
About me
Pierre Mavro - Lead DevOps - NoSQL Team
Working at Criteo as Site Reliability Engineer
@deimosfr
Criteo
31 Offices
2000+ employees
Criteo technical insights
● 700 engineers
● 17K servers
● 27K displays per second
● 2.4M requests per second
Criteo SRE: biggest challenges
● Scaling
● Low latency
● High throughput
● Resiliency
● Automation
Couchbase figures at Criteo (Worldwide)
● 1300+ physical servers
● 100+ clusters (up to 50 servers each)
● 90TB of data in memory
● 25M QPS
● < 8ms constant latency
Couchbase usage at Criteo
● Storing UUIDs < 30b
● Storing blobs (ex. binary images)
● Storing keys size > value data size (sometimes)
● Serving between 100Kqps to 2.5Mqps per cluster
● Low latency at 99perc < 2ms
● Data size per cluster between 500Gb to ~12Tb (with replica)
● All data fits in memory
● Inter datacenter replication (custom client driver)
What we wanted to solve
Legacy infrastructure
● Couchbase v1.8 legacy (80%) and v3.0.1
community (20%)
● Slow rebalance (up to 48h for 1 server)
● Rebalance failures on high loaded
clusters
● Max connection reached on v1.8 (9k)
Legacy infrastructure
● Same cluster shares on persisted and non
persisted buckets
● No dedicated latency monitoring tool
● No auto restart/upgrade orchestrator
● Server benchmarks update required
● Lack of Couchbase best practices
How we achieved the change
1. Benchmarks
Benchmarks
● Couchbase Enterprise 3.1.3
● 3x HP GEN9 DL360 (256GB RAM, 6x400GB SSD RAID10, 1Gb Network
interface) (2x injectors + 1 server)
● Key size: UUID string (36 bytes) + Couchbase metadata (56 bytes)
● Value size: uniform range between 750 B and 1250 B (avg 1 kB)
● Number of items: 50M/node (with replica) or 100M/node (without replica)
● Resident active items (= items fully in RAM): ~50%
● Value-only ejection mode (only data value can be removed from RAM, keeping
metadata + key in RAM).
Benchmarks
Heavy Writes/Little Reads (10Kqps) without replica
Write rate
per node
Status Disk Write
Queue
Latency 50
perc
Latency 95
perc
Latency
99perc
Latency
99,9 perc
40 Kset/s OK 10M items 0.4 ms 0.7 ms 2 ms 8 ms
60 Kset/s OK 30M items 0.4 ms 0.7 ms 2 ms 20 ms
80 Kset/s OK 50M items 0.4 ms 2 ms 7 ms 30 ms
100 Kset/s OK 70M items 1.5 ms 5 ms 10 ms 40 ms
Benchmarks
Heavy Writes/Little Reads (10Kqps) with one replica
Write rate
per node
Status Disk Write
Queue
Latency 50
perc
Latency 95
perc
Latency
99perc
Latency
99,9 perc
20 Kset/s OK 12M items 0.4 ms 1 ms 2 ms 10 ms
30 Kset/s OK 33M items 0.5 ms 2 ms 4 ms 20 ms
40 Kset/s OK 60M items 0.6 ms 2 ms 5 ms 25 ms
50 Kset/s NOK (OOM) >70M items 0.7 ms 5 ms 50 ms 75 ms
Benchmarks
Heavy Reads/Little Writes (10Kqps) with one replica
Read rate
per node
Status Disk Write
Queue
Latency 50
perc
Latency 95
perc
Latency
99perc
Latency
99,9 perc
25 Kset/s OK 130k items 0.4 ms 0.7 ms 4 ms 8 ms
50 Kset/s OK 130k items 0.4 ms 1 ms 5 ms 10 ms
75 Kset/s OK 130k items 0.4 ms 5 ms 15 ms 25 ms
100 Kset/s NOK 50k to 500k
items
16 ms 25 ms 45 ms 100 ms
Benchmarks
Conclusion for a single node:
● Network 1Gb is the bottleneck
● Replicas introduce latency
● Reads are fast
● Max write with replica per node: 40 Kqps
● Max read with replica per node: 90Kqps
● Max read/write without replica per node: 90 Kqps
2. SLI, SLO & SLA
Metrics
Metrics are greats !
● QPS total (read + write)
● Total RAM usage
● Availability
● Number of items
● …
But it’s not relevant enough to know the global service status !
SLI: add the major missing metric
Adding latency monitoring as SLI, to be part of our Couchbase SLO and SLA
3. Couchbase support
Support contract
● Get latest Couchbase bug fixes
● Suggest Couchbase enhancements
● Speed up resolution of incidents with the help of support
● Get better Couchbase tuning recommendations for performance
4. Refactoring infrastructure
Split usages
● High-load (QPS) buckets are on dedicated
clusters
● Low-load (QPS) buckets are shared on
separate “shared” clusters
● Persisted and Non persisted clusters are not
on the same servers anymore
5. Administration Automation
Automation: why?
● Need to upgrade from the community to the enterprise version
● Need to apply new configuration options that require a restart of all the
nodes in a cluster
● Need to apply fixes that require a reboot of all the nodes in a cluster
● Need to reinstall servers from scratch
Automation: how?
● Criteo is using Chef to bootstrap servers, deploy applications and
configuration
● We did not want to add another new tool in the loop
● Nothing with the required features already exists
● We developed a FOSS Chef cookbook for this and other use-cases:
Choregraphie
https://github.com/criteo-cookbooks/choregraphie
Automation: Choregraphie
With Choregraphie we can perform:
● Rolling restart with rebalance
● Rolling upgrade with rebalance
● Use an optional, additional server to speed up rebalance
● Rolling reboot with rebalance
● Rolling reinstall with rebalance
Choregraphie is open source! Feel free to contribute
6. Couchbase and system tuning
Couchbase best practices / system tuning
● Minimize swap usage:
○ vm.swappiness = 0 (set to 1 for kernel >3.5)
● Disable transparent Hugepages:
○ chkconfig disable-thp on
● Set SSD IO-Scheduler to deadline:
○ echo “deadline” > /sys/block/sdX/queue/scheduler
● Change CPUFreq governor:
○ modprobe cpufreq_performance
● Leverage maximum connection:
○ max_conns_on_port_XXXX: 30000
Couchbase tuning
● Upgrade Nonio parameter to 8 to avoid rebalance failures on high-load
clusters:
○ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>",
[{extra_config_string, "max_num_nonio=<N>"}]).' http://<NodeIP>:8091/diag/eval
● Disable access log if you don’t need them to reduce disk usage (native in
Couchbase 4.5):
○ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>",
[{extra_config_string, "access_scanner_enabled=false"}]).' http://<NodeIP>:8091/diag/eval
Tuning...what’s next?
● Network teaming 802.3ad (bonding) with 2x1Gb cards
● 10Gb network cards
● Upgrade to Couchbase 4.5
● Upgrade kernel to a newer LTS vanilla to enable specific SSD enhancement
(multi queues SSD)
● Switch to Mesos to reduce administration time
Questions ?
Criteo - Couchbase Live 2016 - Paris
Pierre Mavro / @deimosfr

Contenu connexe

Tendances

Redis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis LabsRedis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Labs
 
Tales from Taming the Long Tail
Tales from Taming the Long TailTales from Taming the Long Tail
Tales from Taming the Long Tail
HBaseCon
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
ScyllaDB
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
ScyllaDB
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
ScyllaDB
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
DevOpsDays Tel Aviv
 
DynamoDB at HasOffers
DynamoDB at HasOffers DynamoDB at HasOffers
DynamoDB at HasOffers
Amazon Web Services
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
ScyllaDB
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
confluent
 
Awsugnz re invent recap infra
Awsugnz   re invent recap infraAwsugnz   re invent recap infra
Awsugnz re invent recap infra
Terence White
 
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
ScyllaDB
 
Global deduplication for Ceph - Myoungwon Oh
Global deduplication for Ceph - Myoungwon OhGlobal deduplication for Ceph - Myoungwon Oh
Global deduplication for Ceph - Myoungwon Oh
Ceph Community
 
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
ScyllaDB
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
ScyllaDB
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
Tzach Livyatan
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
ScyllaDB
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
Tao Feng
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and Safer
ScyllaDB
 
Hadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On AzureHadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On Azure
Erik Krogen
 

Tendances (20)

Redis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis LabsRedis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis Labs
 
Tales from Taming the Long Tail
Tales from Taming the Long TailTales from Taming the Long Tail
Tales from Taming the Long Tail
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
 
DynamoDB at HasOffers
DynamoDB at HasOffers DynamoDB at HasOffers
DynamoDB at HasOffers
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
 
Awsugnz re invent recap infra
Awsugnz   re invent recap infraAwsugnz   re invent recap infra
Awsugnz re invent recap infra
 
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
 
Global deduplication for Ceph - Myoungwon Oh
Global deduplication for Ceph - Myoungwon OhGlobal deduplication for Ceph - Myoungwon Oh
Global deduplication for Ceph - Myoungwon Oh
 
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and Safer
 
Hadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On AzureHadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On Azure
 

En vedette

Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo?
Criteolabs
 
New challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertisingNew challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertising
Olivier Koch
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders Meetup
Olivier Koch
 
EN - Criteo - BD Deck -July 2014 - rebrand
EN - Criteo - BD Deck -July 2014 - rebrandEN - Criteo - BD Deck -July 2014 - rebrand
EN - Criteo - BD Deck -July 2014 - rebrand
Djilali Zitouni
 
criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015
Carolyn Bednarz
 
Introduction Criteo - 2.0
Introduction Criteo - 2.0Introduction Criteo - 2.0
Introduction Criteo - 2.0
Scott Turecek
 
Ansible meetup-0915
Ansible meetup-0915Ansible meetup-0915
Ansible meetup-0915
Pierre Mavro
 
MongoDB Wins
MongoDB WinsMongoDB Wins
MongoDB Wins
Noreen Seebacher
 
Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB
Business Track: How Criteo Scaled and Supported Massive Growth with MongoDBBusiness Track: How Criteo Scaled and Supported Massive Growth with MongoDB
Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB
MongoDB
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
Nicolasgmail.com Helleringer
 
Criteo. Reach people, not devices!
Criteo. Reach people, not devices!Criteo. Reach people, not devices!
Criteo. Reach people, not devices!
HybridRussia
 
T3UNI12 : SOLR workshop
T3UNI12 : SOLR workshopT3UNI12 : SOLR workshop
T3UNI12 : SOLR workshop
Paul Blondiaux
 
Drbd
DrbdDrbd
How companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseHow companies use NoSQL and Couchbase
How companies use NoSQL and Couchbase
Dipti Borkar
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
PostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLPostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQL
CockroachDB
 
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo
 
Big Data Analytics Proposal #1
Big Data Analytics Proposal #1Big Data Analytics Proposal #1
Big Data Analytics Proposal #1
Ziyad Saleh
 
RecsysFR: Criteo presentation
RecsysFR: Criteo presentationRecsysFR: Criteo presentation
RecsysFR: Criteo presentation
recsysfr
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
boorad
 

En vedette (20)

Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo?
 
New challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertisingNew challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertising
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders Meetup
 
EN - Criteo - BD Deck -July 2014 - rebrand
EN - Criteo - BD Deck -July 2014 - rebrandEN - Criteo - BD Deck -July 2014 - rebrand
EN - Criteo - BD Deck -July 2014 - rebrand
 
criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015
 
Introduction Criteo - 2.0
Introduction Criteo - 2.0Introduction Criteo - 2.0
Introduction Criteo - 2.0
 
Ansible meetup-0915
Ansible meetup-0915Ansible meetup-0915
Ansible meetup-0915
 
MongoDB Wins
MongoDB WinsMongoDB Wins
MongoDB Wins
 
Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB
Business Track: How Criteo Scaled and Supported Massive Growth with MongoDBBusiness Track: How Criteo Scaled and Supported Massive Growth with MongoDB
Business Track: How Criteo Scaled and Supported Massive Growth with MongoDB
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
 
Criteo. Reach people, not devices!
Criteo. Reach people, not devices!Criteo. Reach people, not devices!
Criteo. Reach people, not devices!
 
T3UNI12 : SOLR workshop
T3UNI12 : SOLR workshopT3UNI12 : SOLR workshop
T3UNI12 : SOLR workshop
 
Drbd
DrbdDrbd
Drbd
 
How companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseHow companies use NoSQL and Couchbase
How companies use NoSQL and Couchbase
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
PostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLPostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQL
 
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
 
Big Data Analytics Proposal #1
Big Data Analytics Proposal #1Big Data Analytics Proposal #1
Big Data Analytics Proposal #1
 
RecsysFR: Criteo presentation
RecsysFR: Criteo presentationRecsysFR: Criteo presentation
RecsysFR: Criteo presentation
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 

Similaire à Couchbase live 2016

Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
PostgreSQL Experts, Inc.
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
Nicolas Poggi
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
OVHcloud
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
Mirantis
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
Ranga Swami Reddy Muthumula
 
Ceph barcelona-v-1.2
Ceph barcelona-v-1.2Ceph barcelona-v-1.2
Ceph barcelona-v-1.2
Ranga Swami Reddy Muthumula
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
Karan Singh
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at Uber
Ying Zheng
 
Corralling Big Data at TACC
Corralling Big Data at TACCCorralling Big Data at TACC
Corralling Big Data at TACC
inside-BigData.com
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
AMD Developer Central
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
Nicolas Poggi
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
Dave Holland
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
Bigstep
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
David Grier
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
Ceph Community
 
Erasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William ByrneErasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William Byrne
Ceph Community
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 

Similaire à Couchbase live 2016 (20)

Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Ceph barcelona-v-1.2
Ceph barcelona-v-1.2Ceph barcelona-v-1.2
Ceph barcelona-v-1.2
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at Uber
 
Corralling Big Data at TACC
Corralling Big Data at TACCCorralling Big Data at TACC
Corralling Big Data at TACC
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
Erasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William ByrneErasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William Byrne
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 

Plus de Pierre Mavro

Security model for a remote company
Security model for a remote companySecurity model for a remote company
Security model for a remote company
Pierre Mavro
 
Compass digital ocean’s customer advisory group 2021_10
Compass digital ocean’s customer advisory group 2021_10Compass digital ocean’s customer advisory group 2021_10
Compass digital ocean’s customer advisory group 2021_10
Pierre Mavro
 
Criteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech TalkCriteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech Talk
Pierre Mavro
 
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Pierre Mavro
 
Mariadb mysql avancé
Mariadb mysql avancéMariadb mysql avancé
Mariadb mysql avancé
Pierre Mavro
 
Puppet slides
Puppet slidesPuppet slides
Puppet slides
Pierre Mavro
 

Plus de Pierre Mavro (6)

Security model for a remote company
Security model for a remote companySecurity model for a remote company
Security model for a remote company
 
Compass digital ocean’s customer advisory group 2021_10
Compass digital ocean’s customer advisory group 2021_10Compass digital ocean’s customer advisory group 2021_10
Compass digital ocean’s customer advisory group 2021_10
 
Criteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech TalkCriteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech Talk
 
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
 
Mariadb mysql avancé
Mariadb mysql avancéMariadb mysql avancé
Mariadb mysql avancé
 
Puppet slides
Puppet slidesPuppet slides
Puppet slides
 

Dernier

Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 

Dernier (20)

Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 

Couchbase live 2016

  • 1. Couchbase usage and perfs Criteo - Couchbase Live 2016 - Paris
  • 2. About me Pierre Mavro - Lead DevOps - NoSQL Team Working at Criteo as Site Reliability Engineer @deimosfr
  • 4. Criteo technical insights ● 700 engineers ● 17K servers ● 27K displays per second ● 2.4M requests per second
  • 5. Criteo SRE: biggest challenges ● Scaling ● Low latency ● High throughput ● Resiliency ● Automation
  • 6. Couchbase figures at Criteo (Worldwide) ● 1300+ physical servers ● 100+ clusters (up to 50 servers each) ● 90TB of data in memory ● 25M QPS ● < 8ms constant latency
  • 7. Couchbase usage at Criteo ● Storing UUIDs < 30b ● Storing blobs (ex. binary images) ● Storing keys size > value data size (sometimes) ● Serving between 100Kqps to 2.5Mqps per cluster ● Low latency at 99perc < 2ms ● Data size per cluster between 500Gb to ~12Tb (with replica) ● All data fits in memory ● Inter datacenter replication (custom client driver)
  • 8. What we wanted to solve
  • 9. Legacy infrastructure ● Couchbase v1.8 legacy (80%) and v3.0.1 community (20%) ● Slow rebalance (up to 48h for 1 server) ● Rebalance failures on high loaded clusters ● Max connection reached on v1.8 (9k)
  • 10. Legacy infrastructure ● Same cluster shares on persisted and non persisted buckets ● No dedicated latency monitoring tool ● No auto restart/upgrade orchestrator ● Server benchmarks update required ● Lack of Couchbase best practices
  • 11. How we achieved the change
  • 13. Benchmarks ● Couchbase Enterprise 3.1.3 ● 3x HP GEN9 DL360 (256GB RAM, 6x400GB SSD RAID10, 1Gb Network interface) (2x injectors + 1 server) ● Key size: UUID string (36 bytes) + Couchbase metadata (56 bytes) ● Value size: uniform range between 750 B and 1250 B (avg 1 kB) ● Number of items: 50M/node (with replica) or 100M/node (without replica) ● Resident active items (= items fully in RAM): ~50% ● Value-only ejection mode (only data value can be removed from RAM, keeping metadata + key in RAM).
  • 14. Benchmarks Heavy Writes/Little Reads (10Kqps) without replica Write rate per node Status Disk Write Queue Latency 50 perc Latency 95 perc Latency 99perc Latency 99,9 perc 40 Kset/s OK 10M items 0.4 ms 0.7 ms 2 ms 8 ms 60 Kset/s OK 30M items 0.4 ms 0.7 ms 2 ms 20 ms 80 Kset/s OK 50M items 0.4 ms 2 ms 7 ms 30 ms 100 Kset/s OK 70M items 1.5 ms 5 ms 10 ms 40 ms
  • 15. Benchmarks Heavy Writes/Little Reads (10Kqps) with one replica Write rate per node Status Disk Write Queue Latency 50 perc Latency 95 perc Latency 99perc Latency 99,9 perc 20 Kset/s OK 12M items 0.4 ms 1 ms 2 ms 10 ms 30 Kset/s OK 33M items 0.5 ms 2 ms 4 ms 20 ms 40 Kset/s OK 60M items 0.6 ms 2 ms 5 ms 25 ms 50 Kset/s NOK (OOM) >70M items 0.7 ms 5 ms 50 ms 75 ms
  • 16. Benchmarks Heavy Reads/Little Writes (10Kqps) with one replica Read rate per node Status Disk Write Queue Latency 50 perc Latency 95 perc Latency 99perc Latency 99,9 perc 25 Kset/s OK 130k items 0.4 ms 0.7 ms 4 ms 8 ms 50 Kset/s OK 130k items 0.4 ms 1 ms 5 ms 10 ms 75 Kset/s OK 130k items 0.4 ms 5 ms 15 ms 25 ms 100 Kset/s NOK 50k to 500k items 16 ms 25 ms 45 ms 100 ms
  • 17. Benchmarks Conclusion for a single node: ● Network 1Gb is the bottleneck ● Replicas introduce latency ● Reads are fast ● Max write with replica per node: 40 Kqps ● Max read with replica per node: 90Kqps ● Max read/write without replica per node: 90 Kqps
  • 18. 2. SLI, SLO & SLA
  • 19. Metrics Metrics are greats ! ● QPS total (read + write) ● Total RAM usage ● Availability ● Number of items ● … But it’s not relevant enough to know the global service status !
  • 20. SLI: add the major missing metric Adding latency monitoring as SLI, to be part of our Couchbase SLO and SLA
  • 22. Support contract ● Get latest Couchbase bug fixes ● Suggest Couchbase enhancements ● Speed up resolution of incidents with the help of support ● Get better Couchbase tuning recommendations for performance
  • 24. Split usages ● High-load (QPS) buckets are on dedicated clusters ● Low-load (QPS) buckets are shared on separate “shared” clusters ● Persisted and Non persisted clusters are not on the same servers anymore
  • 26. Automation: why? ● Need to upgrade from the community to the enterprise version ● Need to apply new configuration options that require a restart of all the nodes in a cluster ● Need to apply fixes that require a reboot of all the nodes in a cluster ● Need to reinstall servers from scratch
  • 27. Automation: how? ● Criteo is using Chef to bootstrap servers, deploy applications and configuration ● We did not want to add another new tool in the loop ● Nothing with the required features already exists ● We developed a FOSS Chef cookbook for this and other use-cases: Choregraphie https://github.com/criteo-cookbooks/choregraphie
  • 28. Automation: Choregraphie With Choregraphie we can perform: ● Rolling restart with rebalance ● Rolling upgrade with rebalance ● Use an optional, additional server to speed up rebalance ● Rolling reboot with rebalance ● Rolling reinstall with rebalance Choregraphie is open source! Feel free to contribute
  • 29. 6. Couchbase and system tuning
  • 30. Couchbase best practices / system tuning ● Minimize swap usage: ○ vm.swappiness = 0 (set to 1 for kernel >3.5) ● Disable transparent Hugepages: ○ chkconfig disable-thp on ● Set SSD IO-Scheduler to deadline: ○ echo “deadline” > /sys/block/sdX/queue/scheduler ● Change CPUFreq governor: ○ modprobe cpufreq_performance ● Leverage maximum connection: ○ max_conns_on_port_XXXX: 30000
  • 31. Couchbase tuning ● Upgrade Nonio parameter to 8 to avoid rebalance failures on high-load clusters: ○ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>", [{extra_config_string, "max_num_nonio=<N>"}]).' http://<NodeIP>:8091/diag/eval ● Disable access log if you don’t need them to reduce disk usage (native in Couchbase 4.5): ○ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>", [{extra_config_string, "access_scanner_enabled=false"}]).' http://<NodeIP>:8091/diag/eval
  • 32. Tuning...what’s next? ● Network teaming 802.3ad (bonding) with 2x1Gb cards ● 10Gb network cards ● Upgrade to Couchbase 4.5 ● Upgrade kernel to a newer LTS vanilla to enable specific SSD enhancement (multi queues SSD) ● Switch to Mesos to reduce administration time
  • 33. Questions ? Criteo - Couchbase Live 2016 - Paris Pierre Mavro / @deimosfr