SlideShare une entreprise Scribd logo
1  sur  28
Need For Time Series
Database
Pramit Choudhary, ML Engineer @eHarmony
Motivation
Speed Matters
We want to know, what’s happening NOW
User accessing data through different mobile platform, no patience
Data is scattered around
MongoDb, Voldemort, Netezza, Hive, Whisper, may be more
For cross platform analytical work, data is still moved around ( cause of worry )
Need for simplifying the Database Tech Stack
Increase in complexity as we start tracking more metrics in-regards to Mobile
devices
Data-Analytics Use-cases:
Most of the time we study data pattern over a period of time
e.g. 1. What are probable times for the user to get matches ? => need to start tracking
the amount of time user spends during the day
2. Feature exploration and extraction: What other features could we possibly use ?
=> more t/f/z/p statistics tests probably ?
Re-CAP
Consistency: Data remains consistent after the execution
of an operation. E.g. Post update all client have the same
state of the data.
Availability: Always on ( no downtime)
Partition Tolerance: System continues to function even
with no communication with one another
Different Combinations
CA : Single Cite cluster, all nodes are always in contact. e.g.
SQL type RDMS
CP : Some data may not be accessible, but the rest is
consistent and accurate e.g. MongoDB, HBase, Redis
AP : Available under partitioning, but no guarantee on
consistency e.g. Cassandra, Riak, DynamoDb
No SQL World
• Key-Value Store (Redis, Riak)
• Document Store (MongoDB, Couchbase)
• Column Store (Cassandra, Hbase, OpenTSDB)
• Graph Store (Neo4j, Node.js)
Introducing a new DB
OpenTSDB
Author: Benoit Sigoure @ StumbleUpon
What is OpenTSDB?
Open Source Time Series Database
Store trillions of data points
Sucks up all data and keeps going
Never loses precision
Scales using HBase
Note: Using this as an example, better results with KairosDB or InfluxDB.
They work on similar principles.
Author: Benoit Sigoure and Chris Larsen
Use-Cases
MongoDB and Couchbase : user profiles, product catalogs,
geospatial, financial products, social media, digital
content, gaming, metadata, events, bills and invoices
Hbase and Cassandra : Structured, semi-structured,
unstructured data, full table scans, read, intensive
operations, time series interval data, geospatial data
Other Options
Author: Oliver Hankeln
What are Time Series?
Time Series: Data points for an identity over time
Typical Identity:
Dotted string: web01.sys.cpu.user.0 ( no concept of filters )
OpenTSDB Identity:
Metric: sys.cpu.user
Tags (name/value pairs): act as filters
host=web01 cpu=0
Author: Benoit Sigoure and Chris Larsen
What are Time Series?
Data Point:
Metric + Tags
+ Value: 42
+ Timestamp: 123
„ sys.cpu.user 1234567890 42 host=web01 cpu=0 „
Author: Benoit Sigoure and Chris Larsen
Architecture
Author: Benoit Sigoure and Chris Larsen
Another View
Author: slideshare
About TSDs
Write throughput
Are CPU bounded
Worst Case: Can handle 2000 points/sec on an old 2006 dual core CPU
Read throughput
Depends on the cardinality of a metric
Timespan and number of data points retrieved
Reliability
No single point of failure no concept of master daemon
Dependency, needs HBase with zookeeper
Has single point of failure if running over HDFS, but none with
respect to database.
More info on the Wiki : http://opentsdb.net/faq.html
Simplistic View of the
Table
Without OpenTSDB Hbase Table Representation
Author: Oliver Hankeln
OpenTSDB Magic
“Compact columns by concatenation “
Author: Oliver Hankeln
• Tags are put at the end of the row key
• Timestamp is normalized on 1hr boundaries
Row Key Size
Author: Oliver Hankeln
BenchMarks
Load Phase
Heavy Read
Heavy Read
Heavy Range Scan
Heavy Inserts
Is it being extensively
used?
OVH: #3 largest cloud/hosting provider : Monitor
everything includes network performance, resource
utilization, application performance, customer facing
metric
35 servers, 100k writes/s, 25tb raw data
5 day moving window of Hbase snapshot
Redis cache on top for customer facing data
Yahoo: Monitoring application performance and
statistics ( 15 servers, 280k writes/s
Arista Networks: High performance network
monitoring
5k writes/s uses varnish for caching
MapR
“OpenTSDB is a widely used database intended to store
and analyze time-series data. Originally designed for
only data center monitoring, poor ingest performance
had limited the expansion of its use. This benchmark
demonstrates a viable option for new applications, such
as IoT and other real-time data-analysis applications,
using OpenTSDB running on MapR. “ Ted Dunning, Chief
Application Architect
Others
Some References
Book: TimeSeries Database – Ted Dunning and Ellen
Friedman (
https://www.dropbox.com/s/c1zj0l0q0qmfvo8/Time_
Series_Databases.pdf?dl=0 )
Benchmarks:
https://www.dropbox.com/s/g67yoxwabwb5s0g/Perf
ormanceBenchMark.pdf?dl=0
Lessons learned:
http://www.slideshare.net/cloudera/4-opentsdb-
hbasecon
Some Comparisons:
http://prometheus.io/docs/introduction/comparison/
Demo
Questions?

Contenu connexe

Tendances

Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
Dipti Borkar
 

Tendances (20)

Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
 
Introduction to influx db
Introduction to influx dbIntroduction to influx db
Introduction to influx db
 
AWS Real-Time Event Processing
AWS Real-Time Event ProcessingAWS Real-Time Event Processing
AWS Real-Time Event Processing
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Apache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with SparkApache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with Spark
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta Lake
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
6.hive
6.hive6.hive
6.hive
 
Time series database, InfluxDB & PHP
Time series database, InfluxDB & PHPTime series database, InfluxDB & PHP
Time series database, InfluxDB & PHP
 

En vedette

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 

En vedette (9)

Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 
Apresentação da histórica luta pelo retorno da filosofia no ensino médio como...
Apresentação da histórica luta pelo retorno da filosofia no ensino médio como...Apresentação da histórica luta pelo retorno da filosofia no ensino médio como...
Apresentação da histórica luta pelo retorno da filosofia no ensino médio como...
 
On time-series databases
On time-series databasesOn time-series databases
On time-series databases
 
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
 
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponHBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
 
Arista Networks - Building the Next Generation Workplace and Data Center Usin...
Arista Networks - Building the Next Generation Workplace and Data Center Usin...Arista Networks - Building the Next Generation Workplace and Data Center Usin...
Arista Networks - Building the Next Generation Workplace and Data Center Usin...
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 

Similaire à Need for Time series Database

Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
Bhupesh Bansal
 

Similaire à Need for Time series Database (20)

Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 

Plus de Pramit Choudhary

Plus de Pramit Choudhary (7)

Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
 
Learning to learn - to retrieve information
Learning to learn - to retrieve informationLearning to learn - to retrieve information
Learning to learn - to retrieve information
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
 
Scalable analytics with spark and scala system(sassy)
Scalable analytics with spark and scala system(sassy)Scalable analytics with spark and scala system(sassy)
Scalable analytics with spark and scala system(sassy)
 
Learning to Optimize
Learning to OptimizeLearning to Optimize
Learning to Optimize
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Need for Time series Database

  • 1. Need For Time Series Database Pramit Choudhary, ML Engineer @eHarmony
  • 2. Motivation Speed Matters We want to know, what’s happening NOW User accessing data through different mobile platform, no patience Data is scattered around MongoDb, Voldemort, Netezza, Hive, Whisper, may be more For cross platform analytical work, data is still moved around ( cause of worry ) Need for simplifying the Database Tech Stack Increase in complexity as we start tracking more metrics in-regards to Mobile devices Data-Analytics Use-cases: Most of the time we study data pattern over a period of time e.g. 1. What are probable times for the user to get matches ? => need to start tracking the amount of time user spends during the day 2. Feature exploration and extraction: What other features could we possibly use ? => more t/f/z/p statistics tests probably ?
  • 3. Re-CAP Consistency: Data remains consistent after the execution of an operation. E.g. Post update all client have the same state of the data. Availability: Always on ( no downtime) Partition Tolerance: System continues to function even with no communication with one another
  • 4. Different Combinations CA : Single Cite cluster, all nodes are always in contact. e.g. SQL type RDMS CP : Some data may not be accessible, but the rest is consistent and accurate e.g. MongoDB, HBase, Redis AP : Available under partitioning, but no guarantee on consistency e.g. Cassandra, Riak, DynamoDb
  • 5. No SQL World • Key-Value Store (Redis, Riak) • Document Store (MongoDB, Couchbase) • Column Store (Cassandra, Hbase, OpenTSDB) • Graph Store (Neo4j, Node.js)
  • 6. Introducing a new DB OpenTSDB Author: Benoit Sigoure @ StumbleUpon
  • 7. What is OpenTSDB? Open Source Time Series Database Store trillions of data points Sucks up all data and keeps going Never loses precision Scales using HBase Note: Using this as an example, better results with KairosDB or InfluxDB. They work on similar principles. Author: Benoit Sigoure and Chris Larsen
  • 8. Use-Cases MongoDB and Couchbase : user profiles, product catalogs, geospatial, financial products, social media, digital content, gaming, metadata, events, bills and invoices Hbase and Cassandra : Structured, semi-structured, unstructured data, full table scans, read, intensive operations, time series interval data, geospatial data
  • 10. What are Time Series? Time Series: Data points for an identity over time Typical Identity: Dotted string: web01.sys.cpu.user.0 ( no concept of filters ) OpenTSDB Identity: Metric: sys.cpu.user Tags (name/value pairs): act as filters host=web01 cpu=0 Author: Benoit Sigoure and Chris Larsen
  • 11. What are Time Series? Data Point: Metric + Tags + Value: 42 + Timestamp: 123 „ sys.cpu.user 1234567890 42 host=web01 cpu=0 „ Author: Benoit Sigoure and Chris Larsen
  • 14. About TSDs Write throughput Are CPU bounded Worst Case: Can handle 2000 points/sec on an old 2006 dual core CPU Read throughput Depends on the cardinality of a metric Timespan and number of data points retrieved Reliability No single point of failure no concept of master daemon Dependency, needs HBase with zookeeper Has single point of failure if running over HDFS, but none with respect to database. More info on the Wiki : http://opentsdb.net/faq.html
  • 15. Simplistic View of the Table Without OpenTSDB Hbase Table Representation Author: Oliver Hankeln
  • 16. OpenTSDB Magic “Compact columns by concatenation “ Author: Oliver Hankeln • Tags are put at the end of the row key • Timestamp is normalized on 1hr boundaries
  • 17. Row Key Size Author: Oliver Hankeln
  • 23. Is it being extensively used? OVH: #3 largest cloud/hosting provider : Monitor everything includes network performance, resource utilization, application performance, customer facing metric 35 servers, 100k writes/s, 25tb raw data 5 day moving window of Hbase snapshot Redis cache on top for customer facing data
  • 24. Yahoo: Monitoring application performance and statistics ( 15 servers, 280k writes/s Arista Networks: High performance network monitoring 5k writes/s uses varnish for caching MapR “OpenTSDB is a widely used database intended to store and analyze time-series data. Originally designed for only data center monitoring, poor ingest performance had limited the expansion of its use. This benchmark demonstrates a viable option for new applications, such as IoT and other real-time data-analysis applications, using OpenTSDB running on MapR. “ Ted Dunning, Chief Application Architect
  • 26. Some References Book: TimeSeries Database – Ted Dunning and Ellen Friedman ( https://www.dropbox.com/s/c1zj0l0q0qmfvo8/Time_ Series_Databases.pdf?dl=0 ) Benchmarks: https://www.dropbox.com/s/g67yoxwabwb5s0g/Perf ormanceBenchMark.pdf?dl=0 Lessons learned: http://www.slideshare.net/cloudera/4-opentsdb- hbasecon Some Comparisons: http://prometheus.io/docs/introduction/comparison/
  • 27. Demo

Notes de l'éditeur

  1. HBase has unconquerable superiority in writes, and with a pre-created regions it showed us up to 40K ops/sec. Cassandra also provides noticeable performance during loading phase with around 15K ops/sec. MySQL Cluster can show much higher numbers in “just in-memory” mode
  2. Deferred log flush does the right job for HBase during mutation ops. Edits are committed to the memstore firstly and then aggregated edits are flushed to HLog asynchronously. Cassandra has great write throughput since writes are first written to the commit log with append method which is fast operation. MongoDB’s latency suffers from global write lock. Riak behaves more stably than MongoDB.