SlideShare une entreprise Scribd logo
1  sur  81
Télécharger pour lire hors ligne
Nathaniel Braun
Thursday, April 28th, 2016
OpenTSDB for
monitoring @ Criteo
@
2 | Copyright © 2016 Criteo
•Overview of Hadoop @ Criteo
•Our experimental cluster
•Rationale for OpenTSDB
•Stabilizing & scaling OpenTSDB
•OpenTSDB to the rescue in practice
Hitch hiker’s guide to this presentation
Overview of
Hadoop @ Criteo
@
4 | Copyright © 2016 Criteo
Overview of Hadoop @ Criteo
Tokyo TY5 – PROD AS
Sunnyvale SV6 – PROD NA
HongKong HK5 – PROD CN
Paris PA4 – PROD / PREPROD
Paris PA3 –PREPROD / EXP
Amsterdam AM5 – PROD
Criteo’s 8 Hadoop clusters – running CDH Community Edition
5 | Copyright © 2016 Criteo
AM5: main production cluster
• In use since 2011
• Running CDH3 initially, CDH4 currently
• 1118 DataNodes
• 13 400+ compute cores
• 39 PB of raw disk storage
• 105 TB of RAM capacity
• 40 TB of data imported every day, mostly through HTTPFS
• 100 000+ jobs run daily
Overview of Hadoop @ Criteo – Production AM5
6 | Copyright © 2016 Criteo
PA4: comparable to AM5, with fewer machines
• Migration done in Q4 2015 – H1 2016
• Running CDH5
• 650+ DataNodes
• 15 600+ compute cores
• 54 PB of raw disk storage
• 143 TB of RAM capacity
• Huawei servers (AM5 is HP-based)
Overview of Hadoop @ Criteo – Production PA4
7 | Copyright © 2016 Criteo
Criteo has 3 local production Hadoop clusters
• Sunnyvale (SV6): 20 nodes
• Tokyo (TY5): 35 nodes
• Hong Kong (HK5): 20 nodes
Overview of Hadoop @ Criteo – Production local clusters
8 | Copyright © 2016 Criteo
Criteo has 3 preproduction Hadoop clusters
• Preprod PA3: 54 nodes, running CDH4
• Preprod PA4: 42 nodes, running CDH5
• Experimental: 53 nodes, running CDH5
Overview of Hadoop @ Criteo – Preproduction clusters
9 | Copyright © 2016 Criteo
Overview of Hadoop @ Criteo – Usage
Types of jobs running on our clusters
• Cascading jobs, mostly for joins between different types of logs (e.g. displays & clicks)
• Pure Map/Reduce jobs for recommendation, Hadoop streaming jobs for learning
• Scalding jobs for analytics
• Hive queries for Business Intelligence
• Spark jobs on CDH5 
10 | Copyright © 2016 Criteo
Overview of Hadoop @ Criteo – Special consideration
• Kerberos for security
• High-availability on NameNodes and ResourceManager (CDH5 only)
• Infrastructure installed & maintained with Chef
11 | Copyright © 2016 Criteo
Overview of Hadoop @ Criteo
How can we monitor this complex
infrastructure and services running on top
of it?
Our experimental
cluster
@
13 | Copyright © 2016 Criteo
• Useful for testing infrastructure changes without impacting users (no SLA)
• Test environment for new technologies
• HBase
o Natural joins
o OpenTSDB for metrology & monitoring
o hRaven for job detailed data (not used anymore)
• Spark, now in production @ PA4
Our experimental cluster – Purpose
14 | Copyright © 2016 Criteo
• Based on Google BigTable paper
• Integrated with the Hadoop stack
• Stores data in rows sorted by row key
• Uses regions as an ordered set of rows
• Regions sharded by row key bounds
• Regions managed by Region servers, collocated with DataNodes (data is stored on HDFS)
• Oversize regions split into two regions
• Values stored in columns, with no fixed schema as in RDBMS
• Columns grouped in column families
Our experimental cluster – HBase features
15 | Copyright © 2016 Criteo
Our experimental cluster – HBase architecture
Row key
(user UID)
CF0: user CF1: event
C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site
AAA value Firefox NULL Click Client #0
BBB value Chrome NULL Click Client #0
CCC value Chrome ccc@mail.com Display Client #1
DDD value IE NULL Sales Client #2
EEE value IE NULL Display Client #0
FFF value IE NULL Display Client #3
∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙
XXX value Firefox NULL Sales Client #4
YYY value Chrome NULL Bid Client #5
ZZZ value Opera zzz@mail.com Click Client #5
16 | Copyright © 2016 Criteo
Our experimental cluster – HBase architecture
Row key
(user UID)
CF0: user CF1: event
C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site
AAA value Firefox NULL Click Client #0
BBB value Chrome NULL Click Client #0
CCC value Chrome ccc@mail.com Display Client #1
DDD value IE NULL Sales Client #2
EEE value IE NULL Display Client #0
FFF value IE NULL Display Client #3
∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙
XXX value Firefox NULL Sales Client #4
YYY value Chrome NULL Bid Client #5
ZZZ value Opera zzz@mail.com Click Client #5
R0
R1
R5
17 | Copyright © 2016 Criteo
Our experimental cluster – HBase architecture
Row key
(user UID)
CF0: user CF1: event
C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site
AAA value Firefox NULL Click Client #0
BBB value Chrome NULL Click Client #0
CCC value Chrome ccc@mail.com Display Client #1
DDD value IE NULL Sales Client #2
EEE value IE NULL Display Client #0
FFF value IE NULL Display Client #3
∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙
XXX value Firefox NULL Sales Client #4
YYY value Chrome NULL Bid Client #5
ZZZ value Opera zzz@mail.com Click Client #5
R0
R1
R5
RS1
RS2
18 | Copyright © 2016 Criteo
HBase on the experimental cluster
• 50 region servers
• 44 000+ regions
• ~90 000 requests / second from OpenTSDB
Our experimental cluster – HBase @ Criteo
Rationale for
OpenTSDB
on
20 | Copyright © 2016 Criteo
Metrics to monitor:
• CPU load
• Processes & threads
• RAM available/reserved
• Free/used disk space
• Network statistics
• Sockets open/closed
• Open connections with their statuses
• Network traffic
Rationale for using OpenTSDB – Infrastructure monitoring
21 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Service monitoring
NodeManagers ResourceManagersYARN
DataNodes NameNodes JournalNodesHDFS
ZooKeeper Kerberos
HBase
Kafka Storm
22 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Service monitoring
NodeManagers ResourceManagersYARN
DataNodes NameNodes JournalNodesHDFS
ZooKeeper Kerberos
HBase
Kafka Storm
Huge diversity of services!
23 | Copyright © 2016 Criteo
• Diversity
• Many types of nodes & services
• Must be extensible simply to add new metrics
• Scale
• > 2 500 servers
• ~ 90 000 requests / second
• Storage
• Keep fine-grained resolution (down to the minute, at least)
• Long-term storage for analysis & investigation
Rationale for using OpenTSDB – Scale
24 | Copyright © 2016 Criteo
• Suits the problem well: “Hadoop for monitoring Hadoop”
• Designed for time series: HBase schema optimized for time series queries
• Scalable and resilient, thanks to HBase
• Extensible easily: writing data collector is easy
• Simple to query
Rationale for using OpenTSDB – Solution
25 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Easy to query
uri = URI.parse("http://0.rtsd.hpc.criteo.preprod:4242/api/query")
http = Net::HTTP.start(uri.hostname, uri.port)
http.read_timeout = 300
params = {
'start' => '2016/04/21-10:00:00',
'end' => '2016/04/21-12:00:00',
'queries‘ => {
'aggregator' => 'min',
'downsample' => '5m-min',
'metric' => 'hadoop.resourcemanager.queuemetrics.root.AllocatedMB',
'tags' => {
'cluster' => 'ams',
'host' => 'rm.hpc.criteo.prod'
}
}
request = Net::HTTP::Post.new(uri.path, initheader = {'Content-Type' =>'application/json'})
request.body = params.to_json
response = http.request(request)
26 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Practical UI
27 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Practical UI
Metric
28 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Practical UI
Time range
Metric
29 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Practical UI
Time range
Metric
Tag keys/values
30 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Practical UI
Time range
Metric
Tag keys/values
Aggregator
31 | Copyright © 2016 Criteo
• OpenTSDB consists in Time Series Daemons (TSDs) and tcollectors
• Some TSDs used for writing, others for reading, while tcollectors collect metrics
• TSDs are stateless
• TSDs use asyncHBase to scale
• Quiz: what are the advantages?
Rationale for using OpenTSDB – Design
32 | Copyright © 2016 Criteo
• OpenTSDB consists in Time Series Daemons (TSDs) and tcollectors
• Some TSDs used for writing, others for reading, while tcollectors collect metrics
• TSDs are stateless
• TSDs use asyncHBase to scale
• Quiz: what are the advantages?
Rationale for using OpenTSDB – Design
1. Clients never interact
with HBase directly
2. Simple protocol → easy
to use & extend
3. No state, no
synchronization → great
scalability
33 | Copyright © 2016 Criteo
• Metrics consist in:
• metric name
• UNIX timestamp
• value (64 bit integer or single-precision floating point value).
• tags (key-value pairs) specific to that metric instance
• Tags useful for aggregations on time series
proc.loadavg.15min 1461781436 15 host=0.namenode.hpc.criteo.prod
• Charts: average load in 15 minutes with the count
aggregator (proxy to machine count)
• Quiz: what is the chart below?
Rationale for using OpenTSDB – Metrics
proc.loadavg.15min
34 | Copyright © 2016 Criteo
• Metrics consist in:
• metric name
• UNIX timestamp
• value (64 bit integer or single-precision floating point value).
• tags (key-value pairs) specific to that metric instance
• Tags useful for aggregations on time series
proc.loadavg.15min 1461781436 15 host=0.namenode.hpc.criteo.prod
• Charts: average load in 15 minutes with the count
aggregator (proxy to machine count)
• Quiz: what is the chart below?
Rationale for using OpenTSDB – Metrics
proc.loadavg.15min
proc.loadavg.15min
cluster=*
35 | Copyright © 2016 Criteo
• A single data table (split in regions), named tsdb
• Row key: <metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>]
• timestamp is rounded down to the hour
• This schema helps group data from the same metric & time bucket close together (HBase sorts rows based on the row key)
• Assumption: query first on time range, then metric, then tags, in that order of preference
• Tag keys are sorted lexicographically
• Tags should be limited, because they are in the row key. Usually less than 5 tags.
• Values are stored in columns
• Column name: 2 or 4 bytes. For 2 bytes:
• Encode offset up to 3 600 seconds → 212 = 4096 → 12 bits
• 4 bits left for format/type
• Other tables, for metadata and name ↔ ID mappings
Rationale for using OpenTSDB – HBase schema
36 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – HBase schema
Hexadecimal representation of a row key, with two tags
Sorted row keys for the same metric: 000001
Note: row key size varies across rows, because of tags
37 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Statistics
Quiz: what should we look
for?
38 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Statistics
Quiz: what should we look
for?
39 | Copyright © 2016 Criteo
Rationale for using OpenTSDB – Statistics
Quiz: what should we look
for?
367 513 metrics
30 tag keys (!)
86 194 tag values
Stabilizing &
scaling OpenTSDB
41 | Copyright © 2016 Criteo
OpenTSDB was hard to scale at first. What problem can you see?
Scaling OpenTSDB
42 | Copyright © 2016 Criteo
OpenTSDB was hard to scale at first. What problem can you see?
Scaling OpenTSDB
We’re missing data points 
43 | Copyright © 2016 Criteo
• Analyze all the layers of the system
• Logs are your friends
• Change parameters one by one, not all at once
• Measure, change, deploy, measure. Rinse, repeat
Scaling OpenTSDB – Lessons learned
44 | Copyright © 2016 Criteo
Varnish & OpenResty save the day
Scaling OpenTSDB – Nifty trick
OpenResty
POST -> GET
Varnish
Cache + LB
OpenResty
POST -> GET
Varnish
Cache + LB
OpenResty
POST -> GET
Varnish
Cache + LB
RTSD
Read OpenTSDB
RTSD
Read OpenTSDB
RTSD
Read OpenTSDB
45 | Copyright © 2016 Criteo
Varnish & OpenResty save the day
Scaling OpenTSDB – Nifty trick
OpenResty
POST -> GET
Varnish
Cache + LB
OpenResty
POST -> GET
Varnish
Cache + LB
OpenResty
POST -> GET
Varnish
Cache + LB
RTSD
Read OpenTSDB
RTSD
Read OpenTSDB
RTSD
Read OpenTSDB
OpenTSDB to the
rescue in practice
47 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Easier to use than logs
hadoop.namenode.fsnamesystem.tag.HAState
48 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Easier to use than logs
Two NameNode failovers in one night!
hadoop.namenode.fsnamesystem.tag.HAState
49 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Easier to use than logs
Two NameNode failovers in one night!
• Hard to spot : it in the morning nothing has changed
hadoop.namenode.fsnamesystem.tag.HAState
50 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Easier to use than logs
Two NameNode failovers in one night!
• Hard to spot : it in the morning nothing has changed
• Would be impossible to see with daily aggregation
hadoop.namenode.fsnamesystem.tag.HAState
51 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Easier to use than logs
Two NameNode failovers in one night!
• Hard to spot : it in the morning nothing has changed
• Would be impossible to see with daily aggregation
• Trivia: we fixed the tcollector to get that metric
hadoop.namenode.fsnamesystem.tag.HAState
52 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Investigation
hadoop.nodemanager.direct.TotalCapacity
53 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Investigation
hadoop.nodemanager.direct.TotalCapacity
Huge memory capacity spike
54 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Investigation
hadoop.nodemanager.direct.TotalCapacity
Huge memory capacity spike Node not reporting points
55 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Investigation
hadoop.nodemanager.direct.TotalCapacity
Huge memory capacity spike Node not reporting points
Another huge spike
56 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Investigation
hadoop.nodemanager.direct.TotalCapacity
Huge memory capacity spike Node not reporting points
Another huge spike
No data
57 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Superimpose charts
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
58 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Superimpose charts
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
Service restart – configuration change
59 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Superimpose charts
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
Service restart – configuration change Service restart – OOM
60 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Superimpose charts
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
Service restart – configuration change Service restart – OOM
Log extract:
NodeManager
configured
with 192 GB
physical
memory
allocated to
containers,
which is more
than 80% of
the total
physical
memory
available (89
GB)
61 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Hiccups
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
62 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Hiccups
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
OpenTSDB problem – not node-specific
63 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – Hiccups
hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
OpenTSDB problem – not node-specific Node probably dead 
64 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystem.BlocksTotal
65 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
File deletion
File deletion
hadoop.namenode.fsnamesystem.BlocksTotal
66 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
File deletion
File deletion
File creation
hadoop.namenode.fsnamesystem.BlocksTotal
67 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystem.BlocksTotal
hadoop.namenode.fsnamesystem.FilesTotal
68 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
Slope
hadoop.namenode.fsnamesystem.BlocksTotal
hadoop.namenode.fsnamesystem.FilesTotal
69 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
Slope
hadoop.namenode.fsnamesystem.BlocksTotal
hadoop.namenode.fsnamesystem.FilesTotal
Be careful about the scale!
70 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
71 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is this pattern?
72 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is this pattern?
• Answer: NameNode checkpoint
73 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is this pattern?
• Answer: NameNode checkpoint
• Note: done at regular intervals
74 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is this pattern?
• Answer: NameNode checkpoint
• Note: done at regular intervals
• Trivia: never do a failover during a checkpoint!
75 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
76 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
77 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is the problem?
78 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is the problem?
• Answer: no NameNode checkpoint → no FS image!
79 | Copyright © 2016 Criteo
OpenTSDB to the rescue in practice – NameNode rescue
hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
Quiz: what is the problem?
• Answer: no NameNode checkpoint → no FS image!
• Follow-up: standby namenode could not startup after a failover, because its FS image was too old
80 | Copyright © 2016 Criteo
Criteo ♥ BigData
- Very accessible: only 50 euros, which will be given to charity
- Speakers from leading organizations: Google, Spotify, Mesosphere, Criteo …
https://www.eventbrite.co.uk/e/nabdc-not-another-big-data-conference-registration-24415556587
81 | Copyright © 2016 Criteo
Criteo is hiring!
http://labs.criteo.com/
Criteo is hiring!

Contenu connexe

Tendances

HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxCloudera, Inc.
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviromentChen Robert
 
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...DataStax
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleHBaseCon
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3Rob Skillington
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopSamza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopTao Feng
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Florian Lautenschlager
 
Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and presentGordon Chung
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Markus Höfer
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbagGordon Chung
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentMongoDB
 

Tendances (20)

HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at Box
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviroment
 
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
Gnocchi v3
Gnocchi v3Gnocchi v3
Gnocchi v3
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopSamza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
 
JEEConf. Vanilla java
JEEConf. Vanilla javaJEEConf. Vanilla java
JEEConf. Vanilla java
 
Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and present
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbag
 
The new time series kid on the block
The new time series kid on the blockThe new time series kid on the block
The new time series kid on the block
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
 

Similaire à OpenTSDB for monitoring @ Criteo

Performance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraPerformance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraDave Bechberger
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to DeploymentAerospike, Inc.
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceHBaseCon
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce HBaseCon
 
Cloudstone - Sharpening Your Weapons Through Big Data
Cloudstone - Sharpening Your Weapons Through Big DataCloudstone - Sharpening Your Weapons Through Big Data
Cloudstone - Sharpening Your Weapons Through Big DataChristopher Grayson
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Bostonkbajda
 
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with KafkaAvoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with KafkaHostedbyConfluent
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...NETWAYS
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?Tim Bell
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analyticsmason_s
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 

Similaire à OpenTSDB for monitoring @ Criteo (20)

Performance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraPerformance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and Cassandra
 
SD Times - Docker v2
SD Times - Docker v2SD Times - Docker v2
SD Times - Docker v2
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to Deployment
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Cloudstone - Sharpening Your Weapons Through Big Data
Cloudstone - Sharpening Your Weapons Through Big DataCloudstone - Sharpening Your Weapons Through Big Data
Cloudstone - Sharpening Your Weapons Through Big Data
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with KafkaAvoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 

Dernier

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 

Dernier (20)

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 

OpenTSDB for monitoring @ Criteo

  • 1. Nathaniel Braun Thursday, April 28th, 2016 OpenTSDB for monitoring @ Criteo @
  • 2. 2 | Copyright © 2016 Criteo •Overview of Hadoop @ Criteo •Our experimental cluster •Rationale for OpenTSDB •Stabilizing & scaling OpenTSDB •OpenTSDB to the rescue in practice Hitch hiker’s guide to this presentation
  • 4. 4 | Copyright © 2016 Criteo Overview of Hadoop @ Criteo Tokyo TY5 – PROD AS Sunnyvale SV6 – PROD NA HongKong HK5 – PROD CN Paris PA4 – PROD / PREPROD Paris PA3 –PREPROD / EXP Amsterdam AM5 – PROD Criteo’s 8 Hadoop clusters – running CDH Community Edition
  • 5. 5 | Copyright © 2016 Criteo AM5: main production cluster • In use since 2011 • Running CDH3 initially, CDH4 currently • 1118 DataNodes • 13 400+ compute cores • 39 PB of raw disk storage • 105 TB of RAM capacity • 40 TB of data imported every day, mostly through HTTPFS • 100 000+ jobs run daily Overview of Hadoop @ Criteo – Production AM5
  • 6. 6 | Copyright © 2016 Criteo PA4: comparable to AM5, with fewer machines • Migration done in Q4 2015 – H1 2016 • Running CDH5 • 650+ DataNodes • 15 600+ compute cores • 54 PB of raw disk storage • 143 TB of RAM capacity • Huawei servers (AM5 is HP-based) Overview of Hadoop @ Criteo – Production PA4
  • 7. 7 | Copyright © 2016 Criteo Criteo has 3 local production Hadoop clusters • Sunnyvale (SV6): 20 nodes • Tokyo (TY5): 35 nodes • Hong Kong (HK5): 20 nodes Overview of Hadoop @ Criteo – Production local clusters
  • 8. 8 | Copyright © 2016 Criteo Criteo has 3 preproduction Hadoop clusters • Preprod PA3: 54 nodes, running CDH4 • Preprod PA4: 42 nodes, running CDH5 • Experimental: 53 nodes, running CDH5 Overview of Hadoop @ Criteo – Preproduction clusters
  • 9. 9 | Copyright © 2016 Criteo Overview of Hadoop @ Criteo – Usage Types of jobs running on our clusters • Cascading jobs, mostly for joins between different types of logs (e.g. displays & clicks) • Pure Map/Reduce jobs for recommendation, Hadoop streaming jobs for learning • Scalding jobs for analytics • Hive queries for Business Intelligence • Spark jobs on CDH5 
  • 10. 10 | Copyright © 2016 Criteo Overview of Hadoop @ Criteo – Special consideration • Kerberos for security • High-availability on NameNodes and ResourceManager (CDH5 only) • Infrastructure installed & maintained with Chef
  • 11. 11 | Copyright © 2016 Criteo Overview of Hadoop @ Criteo How can we monitor this complex infrastructure and services running on top of it?
  • 13. 13 | Copyright © 2016 Criteo • Useful for testing infrastructure changes without impacting users (no SLA) • Test environment for new technologies • HBase o Natural joins o OpenTSDB for metrology & monitoring o hRaven for job detailed data (not used anymore) • Spark, now in production @ PA4 Our experimental cluster – Purpose
  • 14. 14 | Copyright © 2016 Criteo • Based on Google BigTable paper • Integrated with the Hadoop stack • Stores data in rows sorted by row key • Uses regions as an ordered set of rows • Regions sharded by row key bounds • Regions managed by Region servers, collocated with DataNodes (data is stored on HDFS) • Oversize regions split into two regions • Values stored in columns, with no fixed schema as in RDBMS • Columns grouped in column families Our experimental cluster – HBase features
  • 15. 15 | Copyright © 2016 Criteo Our experimental cluster – HBase architecture Row key (user UID) CF0: user CF1: event C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site AAA value Firefox NULL Click Client #0 BBB value Chrome NULL Click Client #0 CCC value Chrome ccc@mail.com Display Client #1 DDD value IE NULL Sales Client #2 EEE value IE NULL Display Client #0 FFF value IE NULL Display Client #3 ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ XXX value Firefox NULL Sales Client #4 YYY value Chrome NULL Bid Client #5 ZZZ value Opera zzz@mail.com Click Client #5
  • 16. 16 | Copyright © 2016 Criteo Our experimental cluster – HBase architecture Row key (user UID) CF0: user CF1: event C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site AAA value Firefox NULL Click Client #0 BBB value Chrome NULL Click Client #0 CCC value Chrome ccc@mail.com Display Client #1 DDD value IE NULL Sales Client #2 EEE value IE NULL Display Client #0 FFF value IE NULL Display Client #3 ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ XXX value Firefox NULL Sales Client #4 YYY value Chrome NULL Bid Client #5 ZZZ value Opera zzz@mail.com Click Client #5 R0 R1 R5
  • 17. 17 | Copyright © 2016 Criteo Our experimental cluster – HBase architecture Row key (user UID) CF0: user CF1: event C0: IP C2: browser C3: e-mail C0: time C1: type C2: web site AAA value Firefox NULL Click Client #0 BBB value Chrome NULL Click Client #0 CCC value Chrome ccc@mail.com Display Client #1 DDD value IE NULL Sales Client #2 EEE value IE NULL Display Client #0 FFF value IE NULL Display Client #3 ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ XXX value Firefox NULL Sales Client #4 YYY value Chrome NULL Bid Client #5 ZZZ value Opera zzz@mail.com Click Client #5 R0 R1 R5 RS1 RS2
  • 18. 18 | Copyright © 2016 Criteo HBase on the experimental cluster • 50 region servers • 44 000+ regions • ~90 000 requests / second from OpenTSDB Our experimental cluster – HBase @ Criteo
  • 20. 20 | Copyright © 2016 Criteo Metrics to monitor: • CPU load • Processes & threads • RAM available/reserved • Free/used disk space • Network statistics • Sockets open/closed • Open connections with their statuses • Network traffic Rationale for using OpenTSDB – Infrastructure monitoring
  • 21. 21 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Service monitoring NodeManagers ResourceManagersYARN DataNodes NameNodes JournalNodesHDFS ZooKeeper Kerberos HBase Kafka Storm
  • 22. 22 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Service monitoring NodeManagers ResourceManagersYARN DataNodes NameNodes JournalNodesHDFS ZooKeeper Kerberos HBase Kafka Storm Huge diversity of services!
  • 23. 23 | Copyright © 2016 Criteo • Diversity • Many types of nodes & services • Must be extensible simply to add new metrics • Scale • > 2 500 servers • ~ 90 000 requests / second • Storage • Keep fine-grained resolution (down to the minute, at least) • Long-term storage for analysis & investigation Rationale for using OpenTSDB – Scale
  • 24. 24 | Copyright © 2016 Criteo • Suits the problem well: “Hadoop for monitoring Hadoop” • Designed for time series: HBase schema optimized for time series queries • Scalable and resilient, thanks to HBase • Extensible easily: writing data collector is easy • Simple to query Rationale for using OpenTSDB – Solution
  • 25. 25 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Easy to query uri = URI.parse("http://0.rtsd.hpc.criteo.preprod:4242/api/query") http = Net::HTTP.start(uri.hostname, uri.port) http.read_timeout = 300 params = { 'start' => '2016/04/21-10:00:00', 'end' => '2016/04/21-12:00:00', 'queries‘ => { 'aggregator' => 'min', 'downsample' => '5m-min', 'metric' => 'hadoop.resourcemanager.queuemetrics.root.AllocatedMB', 'tags' => { 'cluster' => 'ams', 'host' => 'rm.hpc.criteo.prod' } } request = Net::HTTP::Post.new(uri.path, initheader = {'Content-Type' =>'application/json'}) request.body = params.to_json response = http.request(request)
  • 26. 26 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Practical UI
  • 27. 27 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Practical UI Metric
  • 28. 28 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Practical UI Time range Metric
  • 29. 29 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Practical UI Time range Metric Tag keys/values
  • 30. 30 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Practical UI Time range Metric Tag keys/values Aggregator
  • 31. 31 | Copyright © 2016 Criteo • OpenTSDB consists in Time Series Daemons (TSDs) and tcollectors • Some TSDs used for writing, others for reading, while tcollectors collect metrics • TSDs are stateless • TSDs use asyncHBase to scale • Quiz: what are the advantages? Rationale for using OpenTSDB – Design
  • 32. 32 | Copyright © 2016 Criteo • OpenTSDB consists in Time Series Daemons (TSDs) and tcollectors • Some TSDs used for writing, others for reading, while tcollectors collect metrics • TSDs are stateless • TSDs use asyncHBase to scale • Quiz: what are the advantages? Rationale for using OpenTSDB – Design 1. Clients never interact with HBase directly 2. Simple protocol → easy to use & extend 3. No state, no synchronization → great scalability
  • 33. 33 | Copyright © 2016 Criteo • Metrics consist in: • metric name • UNIX timestamp • value (64 bit integer or single-precision floating point value). • tags (key-value pairs) specific to that metric instance • Tags useful for aggregations on time series proc.loadavg.15min 1461781436 15 host=0.namenode.hpc.criteo.prod • Charts: average load in 15 minutes with the count aggregator (proxy to machine count) • Quiz: what is the chart below? Rationale for using OpenTSDB – Metrics proc.loadavg.15min
  • 34. 34 | Copyright © 2016 Criteo • Metrics consist in: • metric name • UNIX timestamp • value (64 bit integer or single-precision floating point value). • tags (key-value pairs) specific to that metric instance • Tags useful for aggregations on time series proc.loadavg.15min 1461781436 15 host=0.namenode.hpc.criteo.prod • Charts: average load in 15 minutes with the count aggregator (proxy to machine count) • Quiz: what is the chart below? Rationale for using OpenTSDB – Metrics proc.loadavg.15min proc.loadavg.15min cluster=*
  • 35. 35 | Copyright © 2016 Criteo • A single data table (split in regions), named tsdb • Row key: <metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>] • timestamp is rounded down to the hour • This schema helps group data from the same metric & time bucket close together (HBase sorts rows based on the row key) • Assumption: query first on time range, then metric, then tags, in that order of preference • Tag keys are sorted lexicographically • Tags should be limited, because they are in the row key. Usually less than 5 tags. • Values are stored in columns • Column name: 2 or 4 bytes. For 2 bytes: • Encode offset up to 3 600 seconds → 212 = 4096 → 12 bits • 4 bits left for format/type • Other tables, for metadata and name ↔ ID mappings Rationale for using OpenTSDB – HBase schema
  • 36. 36 | Copyright © 2016 Criteo Rationale for using OpenTSDB – HBase schema Hexadecimal representation of a row key, with two tags Sorted row keys for the same metric: 000001 Note: row key size varies across rows, because of tags
  • 37. 37 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Statistics Quiz: what should we look for?
  • 38. 38 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Statistics Quiz: what should we look for?
  • 39. 39 | Copyright © 2016 Criteo Rationale for using OpenTSDB – Statistics Quiz: what should we look for? 367 513 metrics 30 tag keys (!) 86 194 tag values
  • 41. 41 | Copyright © 2016 Criteo OpenTSDB was hard to scale at first. What problem can you see? Scaling OpenTSDB
  • 42. 42 | Copyright © 2016 Criteo OpenTSDB was hard to scale at first. What problem can you see? Scaling OpenTSDB We’re missing data points 
  • 43. 43 | Copyright © 2016 Criteo • Analyze all the layers of the system • Logs are your friends • Change parameters one by one, not all at once • Measure, change, deploy, measure. Rinse, repeat Scaling OpenTSDB – Lessons learned
  • 44. 44 | Copyright © 2016 Criteo Varnish & OpenResty save the day Scaling OpenTSDB – Nifty trick OpenResty POST -> GET Varnish Cache + LB OpenResty POST -> GET Varnish Cache + LB OpenResty POST -> GET Varnish Cache + LB RTSD Read OpenTSDB RTSD Read OpenTSDB RTSD Read OpenTSDB
  • 45. 45 | Copyright © 2016 Criteo Varnish & OpenResty save the day Scaling OpenTSDB – Nifty trick OpenResty POST -> GET Varnish Cache + LB OpenResty POST -> GET Varnish Cache + LB OpenResty POST -> GET Varnish Cache + LB RTSD Read OpenTSDB RTSD Read OpenTSDB RTSD Read OpenTSDB
  • 46. OpenTSDB to the rescue in practice
  • 47. 47 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Easier to use than logs hadoop.namenode.fsnamesystem.tag.HAState
  • 48. 48 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Easier to use than logs Two NameNode failovers in one night! hadoop.namenode.fsnamesystem.tag.HAState
  • 49. 49 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Easier to use than logs Two NameNode failovers in one night! • Hard to spot : it in the morning nothing has changed hadoop.namenode.fsnamesystem.tag.HAState
  • 50. 50 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Easier to use than logs Two NameNode failovers in one night! • Hard to spot : it in the morning nothing has changed • Would be impossible to see with daily aggregation hadoop.namenode.fsnamesystem.tag.HAState
  • 51. 51 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Easier to use than logs Two NameNode failovers in one night! • Hard to spot : it in the morning nothing has changed • Would be impossible to see with daily aggregation • Trivia: we fixed the tcollector to get that metric hadoop.namenode.fsnamesystem.tag.HAState
  • 52. 52 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Investigation hadoop.nodemanager.direct.TotalCapacity
  • 53. 53 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Investigation hadoop.nodemanager.direct.TotalCapacity Huge memory capacity spike
  • 54. 54 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Investigation hadoop.nodemanager.direct.TotalCapacity Huge memory capacity spike Node not reporting points
  • 55. 55 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Investigation hadoop.nodemanager.direct.TotalCapacity Huge memory capacity spike Node not reporting points Another huge spike
  • 56. 56 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Investigation hadoop.nodemanager.direct.TotalCapacity Huge memory capacity spike Node not reporting points Another huge spike No data
  • 57. 57 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Superimpose charts hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
  • 58. 58 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Superimpose charts hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis Service restart – configuration change
  • 59. 59 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Superimpose charts hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis Service restart – configuration change Service restart – OOM
  • 60. 60 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Superimpose charts hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis Service restart – configuration change Service restart – OOM Log extract: NodeManager configured with 192 GB physical memory allocated to containers, which is more than 80% of the total physical memory available (89 GB)
  • 61. 61 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Hiccups hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis
  • 62. 62 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Hiccups hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis OpenTSDB problem – not node-specific
  • 63. 63 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – Hiccups hadoop.nodemanager.direct.TotalCapacity hadoop.nodemanager.jvmmetrics.GcTimeMillis OpenTSDB problem – not node-specific Node probably dead 
  • 64. 64 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystem.BlocksTotal
  • 65. 65 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue File deletion File deletion hadoop.namenode.fsnamesystem.BlocksTotal
  • 66. 66 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue File deletion File deletion File creation hadoop.namenode.fsnamesystem.BlocksTotal
  • 67. 67 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystem.BlocksTotal hadoop.namenode.fsnamesystem.FilesTotal
  • 68. 68 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue Slope hadoop.namenode.fsnamesystem.BlocksTotal hadoop.namenode.fsnamesystem.FilesTotal
  • 69. 69 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue Slope hadoop.namenode.fsnamesystem.BlocksTotal hadoop.namenode.fsnamesystem.FilesTotal Be careful about the scale!
  • 70. 70 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
  • 71. 71 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is this pattern?
  • 72. 72 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is this pattern? • Answer: NameNode checkpoint
  • 73. 73 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is this pattern? • Answer: NameNode checkpoint • Note: done at regular intervals
  • 74. 74 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is this pattern? • Answer: NameNode checkpoint • Note: done at regular intervals • Trivia: never do a failover during a checkpoint!
  • 75. 75 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
  • 76. 76 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes
  • 77. 77 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is the problem?
  • 78. 78 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is the problem? • Answer: no NameNode checkpoint → no FS image!
  • 79. 79 | Copyright © 2016 Criteo OpenTSDB to the rescue in practice – NameNode rescue hadoop.namenode.fsnamesystemstate.NumLiveDataNodes Quiz: what is the problem? • Answer: no NameNode checkpoint → no FS image! • Follow-up: standby namenode could not startup after a failover, because its FS image was too old
  • 80. 80 | Copyright © 2016 Criteo Criteo ♥ BigData - Very accessible: only 50 euros, which will be given to charity - Speakers from leading organizations: Google, Spotify, Mesosphere, Criteo … https://www.eventbrite.co.uk/e/nabdc-not-another-big-data-conference-registration-24415556587
  • 81. 81 | Copyright © 2016 Criteo Criteo is hiring! http://labs.criteo.com/ Criteo is hiring!