SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Lessons Learned
 from OpenTSDB

Or why OpenTSDB is the way it is
 and how it changed iteratively to
correct some of the mistakes made
                             Benoît “tsuna” Sigoure
                           tsuna@stumbleupon.com
Key concepts

•   Data Points
    (time, value)

•   Metrics
    proc.loadavg.1m

•   Tags
    host=web42     pool=static

•   Metric + Tags = Time Series

•   Order of magnitude: >106 time series, >1012 data points

    put proc.loadavg.1m 1234567890 0.42 host=web42 pool=static
OpenTSDB @ StumbleUpon
• Main production monitoring system for ~2 years
• Storing hundreds of billions of data points
• Adding over 1 billion data points per day
• 13000 data points/s → 130 QPS on HBase
• If you had a 5 node cluster, this load
  would hardly make it sweat
Do’s
• Wider rows to seek faster
  before: ~4KB/row, after: ~20KB
• Make writes idempotent and independent
  before: start rows at arbitrary points in time
  after: align rows on 10m (then 1h) boundaries
• Store more data per KeyValue
  Remember you pay for the key along each value
  in a row, so large keys are really expensive
Don’ts

• Use HTable / HTablePool in app servers
  asynchbase + Netty or Finagle = performance++
• Put variable-length fields in composite keys
  They’re hard to scan
• Exceed a few hundred regions per RegionServer
  “Oversharding” introduces overhead and makes
  recovering from failures more expensive
Use asynchbase
                                         HTable           asynchbase
                scan                          sequential read                     sequential write
50s                             500s                                   200s



38s                             375s                                   150s



25s                             250s                                   100s



13s                             125s                                    50s



 0s                                 0s                                   0s
      4   8      16       24   32        4    8      16       24   32         4   8      16       24   32
              # Threads                           # Threads                           # Threads
How OpenTSDB
         came to be the
            way it is
Questions:
• How to store time series data efficiently in HBase?
• How to enable concurrent writes without
  synchronization between the writers?
• How to save space/memory when storing
  hundreds of billions of data items in HBase?
Ta
     Time Series Data in HBase                    ke
                                                       1

                   Col                 don’t care
                         umn
             Key
             1234567890        1
                                      values
             1234567892        2
timestamps
             1234567894        3


Simplest design: only 1 time series, 1 row with a
single KeyValue per data point.
Supports time-range scans.
Ta
     Time Series Data in HBase                  ke
                                                     2

                     Colu
                         mn
           Key
            foo 1234567890     1

            foo 1234567892     3
  metric
  name     fool 1234567890     2


Metric name first in row key for data locality.
Problem: can’t store the metric as text in row key
due to space concerns
Ta
     Time Series Data in HBase                      ke
                                                         3

                       Colu               Separate
                           mn
           Key                          Lookup Table:
                                          Key    Value
             0x1 1234567890       1
                                          0x1    foo
             0x1 1234567892       3       0x2    fool
  metric
                                           foo   0x1
   ID        0x2 1234567890       2
                                          fool   0x2

Use a separate table to assign unique IDs to
metric names (and tags, not shown here). IDs give us a
predictable length and achieve desired data locality.
Ta
    Time Series Data in HBase                ke
                                                  4

                    Colu
                        mn   +0    +2
          Key
           0x1 1234567890     1    3

           0x1 1234567892     3

           0x2 1234567890     2


Reduce the number of rows by storing multiple
consecutive data points in the same row.
Fewer rows = faster to seek to a specific row.
Ta
       Time Series Data in HBase                             ke
                                                                   4

                          Colu
                              mn   +0       +2
                 Key
                  0x1 1234567890    1       3
  Misleading
     table        0x1 1234567892    3
representation
                  0x2 1234567890    2

 Gotcha #1: wider rows don’t save any space*
                     Key      Colum Value
            le 0x1 1234567890   n
                               +0     1
         ab
      l t d 0x1 1234567890
    ua re                      +2     3          * Until magic prefix
  ct to                        +0     2
                                                 compression happens in
 A s           0x2 1234567890                    upcoming HBase 0.94
Ta
     Time Series Data in HBase                  ke
                                                     4

                     Colu
                         mn    +0    +2
          Key
            0x1 1234567890      1    3

            0x1 1234567892      3

            0x2 1234567890      2


Devil is in the details: when to start new rows?
Naive answer: start on first data point, after some
time start a new row.
Ta
 Time Series Data in HBase                          ke
                                                         4

                       Colu
                           mn   +0
         Key

           0x1 1000000000        1




                    0000 00 1   TSD1
             1000                      First data point:
         foo                           Start a new row
Client                          TSD2
Ta
 Time Series Data in HBase                             ke
                                                            4

                       Colu
                           mn   +0     +10     ...
         Key

           0x1 1000000000        1      2      ...




                    0000 10 2   TSD1
             1000                       Keep adding
         foo
                                        points until...
Client                          TSD2
Ta
 Time Series Data in HBase                            ke
                                                           4

                      Colu
                          mn    +0     +10    ... +599
          Key

            0x1 1000000000       1      2     ...   42




                           42
                 0000 0599      TSD1
                                       ... some arbitrary
         fo o 10
                                         limit, say 10min
Client                          TSD2
Ta
 Time Series Data in HBase                           ke
                                                          4

                      Colu
                          mn    +0     +10   ... +599
          Key
            0x1 1000000000       1      2    ...   42

            0x1 1000000600              51




                           51
                 0000 0610      TSD1
                                       Then start a new
         fo o 10
                                             row
Client                          TSD2
Ta
 Time Series Data in HBase                        ke
                                                       4

                       Colu
                           mn   +0
         Key
           0x1 1234567890        1




     But this scheme fails with multiple TSDs


                    5678 90 1   TSD1   Create new row
         foo 1234
Client                          TSD2
Ta
 Time Series Data in HBase                       ke
                                                      4

                       Colu
                           mn   +0     +2
         Key
           0x1 1234567890        1     3




                    5678 92 3   TSD1    Add to row
         foo 1234
Client                          TSD2
Ta
 Time Series Data in HBase                          ke
                                                         4

                    Colu
                        mn     +0     +2
         Key
          0x1 1234567890       1      3
                                            Oops!
          0x1 1234567892       3

 Maybe a connection failure occurred, client is
     retransmitting data to another TSD

                              TSD1     Add to row
         foo 12345678
                      92 3
Client                        TSD2   Create new row
Ta
      Time Series Data in HBase               ke
                                                   5

                       Colu
                           mn   +90   +92
              Key
    Base
timestamp      0x1 1234567800    1     3
  always a
multiple of    0x2 1234567800    2
    600


In order to scale easily and keep TSD stateless,
make writes independent & idempotent.
New rule: rows are aligned on 10 min. boundaries
Ta
      Time Series Data in HBase               ke
                                                   6

                       Colu
                           mn +1890 +1892
              Key
    Base
timestamp      0x1 1234566000   1     3
  always a
multiple of    0x2 1234566000   2
    3600


1 data point every ~10s => 60 data points / row
Not much. Go to wider rows to further increase
seek speed. One hour rows = 6x fewer rows
Ta
    Time Series Data in HBase                          ke
                                                            6

                        Colu
                              mn +1890 +1892
           Key
             0x1 1234566000          1      3

             0x2 1234566000          2

Remember: wider rows don’t save any space!
                   Key         Colum Value Key is easily 4x
          le 0x1 1234566000      n
                               +1890   1     bigger than
      tab                                  column + value
    al ed 0x1 1234566000
  tu or
                               +1892   3
Ac st        0x2 1234566000    +1890   2    and repeated
Ta
      Time Series Data in HBase                                ke
                                                                    7

                            Colu
                                 mn   +1890 +1890 +1892 +1892
              Key
                   0x1 1234566000      1      1    3     3

                   0x2 1234566000      2

Solution: “compact” columns by concatenation
                     Key      Column Value Space savings
          le 0x1   1234566000 +1890        1   on disk and in
      tab
    al ed 0x1
  tu or            1234566000 +1890,+1892 1, 3 memory are
Ac st        0x1   1234566000 +1892        3    huge: data is
           0x2     1234566000 +1890        2 4x-8x smaller!
¿ Questions ?
            ub
            tH
        Gi
       on

                   opentsdb.net
    e
   m
  kr
Fo




                                Summary

 • Use asynchbase             • Use Netty or Finagle
 • Wider table > Taller table • Short family names
 • Make writes idempotent • Make writes independent
 • Compact your data          • Have predictable key sizes
                                    ool?
                   Thin k this is c          Benoît “tsuna” Sigoure
                 W e’re hiring             tsuna@stumbleupon.com

Contenu connexe

Tendances

Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...
Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...
Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...Databricks
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Julien Le Dem
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path HBaseCon
 

Tendances (20)

Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Apache Kafka Security
Apache Kafka Security Apache Kafka Security
Apache Kafka Security
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...
Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...
Migrating Apache Hive Workload to Apache Spark: Bridge the Gap with Zhan Zhan...
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Meet Apache HBase - 2.0
Meet Apache HBase - 2.0Meet Apache HBase - 2.0
Meet Apache HBase - 2.0
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 

En vedette

Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBGeoffrey Anderson
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviromentChen Robert
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
Twitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitTwitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitGuido Schmutz
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Making Big Data Analytics Interactive and Real-­Time
 Making Big Data Analytics Interactive and Real-­Time Making Big Data Analytics Interactive and Real-­Time
Making Big Data Analytics Interactive and Real-­TimeSeven Nguyen
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseDataWorks Summit
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013Nathan Bijnens
 
HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxCloudera, Inc.
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 

En vedette (20)

Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviroment
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Twitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitTwitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in Echtzeit
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Making Big Data Analytics Interactive and Real-­Time
 Making Big Data Analytics Interactive and Real-­Time Making Big Data Analytics Interactive and Real-­Time
Making Big Data Analytics Interactive and Real-­Time
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBase
 
Intro to Pig UDF
Intro to Pig UDFIntro to Pig UDF
Intro to Pig UDF
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
 
HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at Box
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Spark and shark
Spark and sharkSpark and shark
Spark and shark
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 

Similaire à HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012Daum DNA
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Cloudera, Inc.
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraTyler Hobbs
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performanceDaum DNA
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraHanborq Inc.
 
Apache Cassandra Opinion and Fact
Apache Cassandra Opinion and FactApache Cassandra Opinion and Fact
Apache Cassandra Opinion and Factmediumdata
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shaintrihug
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_uploadRajini Ramesh
 
Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsTyler Hobbs
 
Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0Vasu Jain
 

Similaire à HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon (20)

Taming Cassandra
Taming CassandraTaming Cassandra
Taming Cassandra
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Bayesian Counters
Bayesian CountersBayesian Counters
Bayesian Counters
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Cryptography-101
Cryptography-101Cryptography-101
Cryptography-101
 
Cryptography - 101
Cryptography - 101Cryptography - 101
Cryptography - 101
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
Data structures
Data structuresData structures
Data structures
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Apache Cassandra Opinion and Fact
Apache Cassandra Opinion and FactApache Cassandra Opinion and Fact
Apache Cassandra Opinion and Fact
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shain
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_upload
 
Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails Devs
 
Assembler
AssemblerAssembler
Assembler
 
Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

  • 1. Lessons Learned from OpenTSDB Or why OpenTSDB is the way it is and how it changed iteratively to correct some of the mistakes made Benoît “tsuna” Sigoure tsuna@stumbleupon.com
  • 2. Key concepts • Data Points (time, value) • Metrics proc.loadavg.1m • Tags host=web42 pool=static • Metric + Tags = Time Series • Order of magnitude: >106 time series, >1012 data points put proc.loadavg.1m 1234567890 0.42 host=web42 pool=static
  • 3. OpenTSDB @ StumbleUpon • Main production monitoring system for ~2 years • Storing hundreds of billions of data points • Adding over 1 billion data points per day • 13000 data points/s → 130 QPS on HBase • If you had a 5 node cluster, this load would hardly make it sweat
  • 4. Do’s • Wider rows to seek faster before: ~4KB/row, after: ~20KB • Make writes idempotent and independent before: start rows at arbitrary points in time after: align rows on 10m (then 1h) boundaries • Store more data per KeyValue Remember you pay for the key along each value in a row, so large keys are really expensive
  • 5. Don’ts • Use HTable / HTablePool in app servers asynchbase + Netty or Finagle = performance++ • Put variable-length fields in composite keys They’re hard to scan • Exceed a few hundred regions per RegionServer “Oversharding” introduces overhead and makes recovering from failures more expensive
  • 6. Use asynchbase HTable asynchbase scan sequential read sequential write 50s 500s 200s 38s 375s 150s 25s 250s 100s 13s 125s 50s 0s 0s 0s 4 8 16 24 32 4 8 16 24 32 4 8 16 24 32 # Threads # Threads # Threads
  • 7. How OpenTSDB came to be the way it is Questions: • How to store time series data efficiently in HBase? • How to enable concurrent writes without synchronization between the writers? • How to save space/memory when storing hundreds of billions of data items in HBase?
  • 8. Ta Time Series Data in HBase ke 1 Col don’t care umn Key 1234567890 1 values 1234567892 2 timestamps 1234567894 3 Simplest design: only 1 time series, 1 row with a single KeyValue per data point. Supports time-range scans.
  • 9. Ta Time Series Data in HBase ke 2 Colu mn Key foo 1234567890 1 foo 1234567892 3 metric name fool 1234567890 2 Metric name first in row key for data locality. Problem: can’t store the metric as text in row key due to space concerns
  • 10. Ta Time Series Data in HBase ke 3 Colu Separate mn Key Lookup Table: Key Value 0x1 1234567890 1 0x1 foo 0x1 1234567892 3 0x2 fool metric foo 0x1 ID 0x2 1234567890 2 fool 0x2 Use a separate table to assign unique IDs to metric names (and tags, not shown here). IDs give us a predictable length and achieve desired data locality.
  • 11. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 0x1 1234567892 3 0x2 1234567890 2 Reduce the number of rows by storing multiple consecutive data points in the same row. Fewer rows = faster to seek to a specific row.
  • 12. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 Misleading table 0x1 1234567892 3 representation 0x2 1234567890 2 Gotcha #1: wider rows don’t save any space* Key Colum Value le 0x1 1234567890 n +0 1 ab l t d 0x1 1234567890 ua re +2 3 * Until magic prefix ct to +0 2 compression happens in A s 0x2 1234567890 upcoming HBase 0.94
  • 13. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 0x1 1234567892 3 0x2 1234567890 2 Devil is in the details: when to start new rows? Naive answer: start on first data point, after some time start a new row.
  • 14. Ta Time Series Data in HBase ke 4 Colu mn +0 Key 0x1 1000000000 1 0000 00 1 TSD1 1000 First data point: foo Start a new row Client TSD2
  • 15. Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... Key 0x1 1000000000 1 2 ... 0000 10 2 TSD1 1000 Keep adding foo points until... Client TSD2
  • 16. Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... +599 Key 0x1 1000000000 1 2 ... 42 42 0000 0599 TSD1 ... some arbitrary fo o 10 limit, say 10min Client TSD2
  • 17. Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... +599 Key 0x1 1000000000 1 2 ... 42 0x1 1000000600 51 51 0000 0610 TSD1 Then start a new fo o 10 row Client TSD2
  • 18. Ta Time Series Data in HBase ke 4 Colu mn +0 Key 0x1 1234567890 1 But this scheme fails with multiple TSDs 5678 90 1 TSD1 Create new row foo 1234 Client TSD2
  • 19. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 5678 92 3 TSD1 Add to row foo 1234 Client TSD2
  • 20. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 Oops! 0x1 1234567892 3 Maybe a connection failure occurred, client is retransmitting data to another TSD TSD1 Add to row foo 12345678 92 3 Client TSD2 Create new row
  • 21. Ta Time Series Data in HBase ke 5 Colu mn +90 +92 Key Base timestamp 0x1 1234567800 1 3 always a multiple of 0x2 1234567800 2 600 In order to scale easily and keep TSD stateless, make writes independent & idempotent. New rule: rows are aligned on 10 min. boundaries
  • 22. Ta Time Series Data in HBase ke 6 Colu mn +1890 +1892 Key Base timestamp 0x1 1234566000 1 3 always a multiple of 0x2 1234566000 2 3600 1 data point every ~10s => 60 data points / row Not much. Go to wider rows to further increase seek speed. One hour rows = 6x fewer rows
  • 23. Ta Time Series Data in HBase ke 6 Colu mn +1890 +1892 Key 0x1 1234566000 1 3 0x2 1234566000 2 Remember: wider rows don’t save any space! Key Colum Value Key is easily 4x le 0x1 1234566000 n +1890 1 bigger than tab column + value al ed 0x1 1234566000 tu or +1892 3 Ac st 0x2 1234566000 +1890 2 and repeated
  • 24. Ta Time Series Data in HBase ke 7 Colu mn +1890 +1890 +1892 +1892 Key 0x1 1234566000 1 1 3 3 0x2 1234566000 2 Solution: “compact” columns by concatenation Key Column Value Space savings le 0x1 1234566000 +1890 1 on disk and in tab al ed 0x1 tu or 1234566000 +1890,+1892 1, 3 memory are Ac st 0x1 1234566000 +1892 3 huge: data is 0x2 1234566000 +1890 2 4x-8x smaller!
  • 25. ¿ Questions ? ub tH Gi on opentsdb.net e m kr Fo Summary • Use asynchbase • Use Netty or Finagle • Wider table > Taller table • Short family names • Make writes idempotent • Make writes independent • Compact your data • Have predictable key sizes ool? Thin k this is c Benoît “tsuna” Sigoure W e’re hiring tsuna@stumbleupon.com