SlideShare a Scribd company logo
1 of 38
Time Series Data in MongoDB
Senior Solutions Architect, MongoDB Inc.
Massimo Brignoli
#mongodb
Agenda
• What is time series data?
• Schema design considerations
• Broader use case: operational intelligence
• MMS Monitoring schema design
• Thinking ahead
• Questions
What is time series data?
Time Series Data is Everywhere
• Financial markets pricing (stock ticks)
• Sensors (temperature, pressure, proximity)
• Industrial fleets (location, velocity, operational)
• Social networks (status updates)
• Mobile devices (calls, texts)
• Systems (server logs, application logs)
Time Series Data at a Higher Level
• Widely applicable data model
• Applies to several different “data use cases”
• Various schema and modeling options
• Application requirements drive schema design
Time Series Data Considerations
• Resolution of raw events
• Resolution needed to support
– Applications
– Analysis
– Reporting
• Data retention policies
– Data ages out
– Retention
Schema Design
Considerations
Designing For Writing and Reading
• Document per event
• Document per minute (average)
• Document per minute (second)
• Document per hour
Document Per Event
{
server: “server1”,
load: 92,
ts: ISODate("2013-10-16T22:07:38.000-0500")
}
• Relational-centric approach
• Insert-driven workload
• Aggregations computed at application-level
Document Per Minute (Average)
{
server: “server1”,
load_num: 92,
load_sum: 4500,
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Pre-aggregate to compute average per minute more easily
• Update-driven workload
• Resolution at the minute-level
Document Per Minute (By Second)
{
server: “server1”,
load: { 0: 15, 1: 20, …, 58: 45, 59: 40 }
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Store per-second data at the minute level
• Update-driven workload
• Pre-allocate structure to avoid document moves
Document Per Hour (By Second)
{
server: “server1”,
load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 }
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 3599 steps
Document Per Hour (By Second)
{
server: “server1”,
load: {
0: {0: 15, …, 59: 45},
….
59: {0: 25, …, 59: 75}
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level with nesting
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 59+59 steps
Characterzing Write Differences
• Example: data generated every second
• Capturing data per minute requires:
– Document per event: 60 writes
– Document per minute: 1 write, 59 updates
• Transition from insert driven to update driven
– Individual writes are smaller
– Performance and concurrency benefits
Characterizing Read Differences
• Example: data generated every second
• Reading data for a single hour requires:
– Document per event: 3600 reads
– Document per minute: 60 reads
• Read performance is greatly improved
– Optimal with tuned block sizes and read ahead
– Fewer disk seeks
MMS Monitoring Schema
Design
MMS Monitoring
• MongoDB Management System Monitoring
• Available in two flavors
– Free cloud-hosted monitoring
– On-premise with MongoDB Enterprise
• Monitor single node, replica set, or sharded cluster
deployments
• Metric dashboards and custom alert triggers
MMS Monitoring
MMS Monitoring
MMS Application Requirements
Resolution defines granularity of
stored data
Range controls the retention
policy, e.g. after 24 hours only 5-
minute resolution
Display dictates the stored pre-
aggregations, e.g. total and count
Monitoring Schema Design
• Per-minute documentmodel
• Documentsstore individual metrics and counts
• Supports“total” and “avg/sec”display
{
timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”),
num_samples: 58,
total_samples: 108000000,
type: “memory_used”,
values: {
0: 999999,
…
59: 1800000
}
}
Monitoring Data Updates
• Single update required to add new data and
increment associated counts
db.metrics.update(
{
timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
type: “memory_used”
},
{
{$set: {“values.59”: 2000000 }},
{$inc: {num_samples: 1, total_samples: 2000000 }}
}
)
Monitoring Data Management
• Data stored at different granularity levels for read
performance
• Collections are organized into specific intervals
• Retention is managed by simply dropping
collections as they age out
• Document structure is pre-created to maximize write
performance
Use Case: Operational
Intelligence
What is Operational Intelligence
• Storing log data
– Capturing application and/or server generated events
• Hierarchical aggregation
– Rolling approach to generate rollups
– e.g. hourly > daily > weekly > monthly
• Pre-aggregated reports
– Processing data to generate reporting from raw events
Storing Log Data
{
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
user: 'frank',
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
request: "GET /apache_pb.gif HTTP/1.0",
status: 200,
response_size: 2326,
referrer: “http://www.example.com/start.html",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
}
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
"[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en]
(Win98; I ;Nav)”
Pre-Aggregation
• Analytics across raw events can involve many reads
• Alternative schemas can improve read and write
performance
• Data can be organized into more coarse buckets
• Transition from insert-driven to update-driven
workloads
Pre-Aggregated Log Data
{
timestamp_minute: ISODate("2000-10-10T20:55:00Z"),
resource: "/index.html",
page_views: {
0: 50,
…
59: 250
}
}
• Leverage time-seriesstyle bucketing
• Trackindividual metrics (ex. page views)
• Improve performancefor reads/writes
• Minimal processingoverhead
Hierarchical Aggregation
• Analytical approach as opposed to schema
approach
– Leverage built-inAggregation Framework or MapReduce
• Execute multiple tasks sequentially to aggregate at
varying levels
• Raw events  Hourly  Weekly  Monthly
• Rolling approach distributes the aggregation
workload
Thinking Ahead
Before You Start
• What are the application requirements?
• Is pre-aggregation useful for your application?
• What are your retention and age-out policies?
• What are the gotchas?
– Pre-create document structure to avoid fragmentation and
performance problems
– Organize your data for growth – time series data grows
fast!
Down The Road
• Scale-out considerations
– Vertical vs. horizontal (with sharding)
• Understanding the data
– Aggregation
– Analytics
– Reporting
• Deeper data analysis
– Patterns
– Predictions
Scaling Time Series Data in
MongoDB
• Vertical growth
– Larger instances with more CPU and memory
– Increased storage capacity
• Horizontal growth
– Partitioning data across many machines
– Dividing and distributing the workload
Time Series Sharding
Considerations
• What are the application requirements?
– Primarily collecting data
– Primarily reporting data
– Both
• Map those back to
– Write performance needs
– Read/write query distribution
– Collection organization (see MMS Monitoring)
• Example: {metric name, coarse timestamp}
Aggregates, Analytics, Reporting
• Aggregation Framework can be used for analysis
– Does it work with the chosen schema design?
– What sorts of aggregations are needed?
• Reporting can be done on predictable, rolling basis
– See “HierarchicalAggregation”
• Consider secondary reads for analytical operations
– Minimize load on production primaries
Deeper Data Analysis
• Leverage MongoDB-Hadoop connector
– Bi-directional support for reading/writing
– Works with online and offline data (e.g. backup files)
• Compute using MapReduce
– Patterns
– Recommendations
– Etc.
• Explore data
– Pig
– Hive
Questions?
Resources
• Schema Design for Time Series Data in MongoDB
http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-
data-in-mongodb
• Operational Intelligence Use Case
http://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence
• Data Modeling in MongoDB
http://docs.mongodb.org/manual/data-modeling/
• Schema Design (webinar)
http://www.mongodb.com/events/webinar/schema-design-oct2013

More Related Content

Similar to Mongo db 2.4 time series data - Brignoli

Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesWebinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesMongoDB
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemDataWorks Summit
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBMongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best PracticesLewis Lin 🦊
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...In-Memory Computing Summit
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
Connecticut CMG - Demystifying Oracle database capacity management with wor...
Connecticut CMG - Demystifying Oracle database  capacity management with  wor...Connecticut CMG - Demystifying Oracle database  capacity management with  wor...
Connecticut CMG - Demystifying Oracle database capacity management with wor...Renato Bonomini
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analyticsamesar0
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...Big Data Spain
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Dataconomy Media
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Maya Lumbroso
 

Similar to Mongo db 2.4 time series data - Brignoli (20)

Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesWebinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Connecticut CMG - Demystifying Oracle database capacity management with wor...
Connecticut CMG - Demystifying Oracle database  capacity management with  wor...Connecticut CMG - Demystifying Oracle database  capacity management with  wor...
Connecticut CMG - Demystifying Oracle database capacity management with wor...
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
Census-as-a-service
Census-as-a-serviceCensus-as-a-service
Census-as-a-service
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Mongo db 2.4 time series data - Brignoli

  • 1. Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb
  • 2. Agenda • What is time series data? • Schema design considerations • Broader use case: operational intelligence • MMS Monitoring schema design • Thinking ahead • Questions
  • 3. What is time series data?
  • 4. Time Series Data is Everywhere • Financial markets pricing (stock ticks) • Sensors (temperature, pressure, proximity) • Industrial fleets (location, velocity, operational) • Social networks (status updates) • Mobile devices (calls, texts) • Systems (server logs, application logs)
  • 5. Time Series Data at a Higher Level • Widely applicable data model • Applies to several different “data use cases” • Various schema and modeling options • Application requirements drive schema design
  • 6. Time Series Data Considerations • Resolution of raw events • Resolution needed to support – Applications – Analysis – Reporting • Data retention policies – Data ages out – Retention
  • 8. Designing For Writing and Reading • Document per event • Document per minute (average) • Document per minute (second) • Document per hour
  • 9. Document Per Event { server: “server1”, load: 92, ts: ISODate("2013-10-16T22:07:38.000-0500") } • Relational-centric approach • Insert-driven workload • Aggregations computed at application-level
  • 10. Document Per Minute (Average) { server: “server1”, load_num: 92, load_sum: 4500, ts: ISODate("2013-10-16T22:07:00.000-0500") } • Pre-aggregate to compute average per minute more easily • Update-driven workload • Resolution at the minute-level
  • 11. Document Per Minute (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 58: 45, 59: 40 } ts: ISODate("2013-10-16T22:07:00.000-0500") } • Store per-second data at the minute level • Update-driven workload • Pre-allocate structure to avoid document moves
  • 12. Document Per Hour (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 } ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 3599 steps
  • 13. Document Per Hour (By Second) { server: “server1”, load: { 0: {0: 15, …, 59: 45}, …. 59: {0: 25, …, 59: 75} ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level with nesting • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 59+59 steps
  • 14. Characterzing Write Differences • Example: data generated every second • Capturing data per minute requires: – Document per event: 60 writes – Document per minute: 1 write, 59 updates • Transition from insert driven to update driven – Individual writes are smaller – Performance and concurrency benefits
  • 15. Characterizing Read Differences • Example: data generated every second • Reading data for a single hour requires: – Document per event: 3600 reads – Document per minute: 60 reads • Read performance is greatly improved – Optimal with tuned block sizes and read ahead – Fewer disk seeks
  • 17. MMS Monitoring • MongoDB Management System Monitoring • Available in two flavors – Free cloud-hosted monitoring – On-premise with MongoDB Enterprise • Monitor single node, replica set, or sharded cluster deployments • Metric dashboards and custom alert triggers
  • 20. MMS Application Requirements Resolution defines granularity of stored data Range controls the retention policy, e.g. after 24 hours only 5- minute resolution Display dictates the stored pre- aggregations, e.g. total and count
  • 21. Monitoring Schema Design • Per-minute documentmodel • Documentsstore individual metrics and counts • Supports“total” and “avg/sec”display { timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”), num_samples: 58, total_samples: 108000000, type: “memory_used”, values: { 0: 999999, … 59: 1800000 } }
  • 22. Monitoring Data Updates • Single update required to add new data and increment associated counts db.metrics.update( { timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: “memory_used” }, { {$set: {“values.59”: 2000000 }}, {$inc: {num_samples: 1, total_samples: 2000000 }} } )
  • 23. Monitoring Data Management • Data stored at different granularity levels for read performance • Collections are organized into specific intervals • Retention is managed by simply dropping collections as they age out • Document structure is pre-created to maximize write performance
  • 25. What is Operational Intelligence • Storing log data – Capturing application and/or server generated events • Hierarchical aggregation – Rolling approach to generate rollups – e.g. hourly > daily > weekly > monthly • Pre-aggregated reports – Processing data to generate reporting from raw events
  • 26. Storing Log Data { _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", user: 'frank', time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" } 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)”
  • 27. Pre-Aggregation • Analytics across raw events can involve many reads • Alternative schemas can improve read and write performance • Data can be organized into more coarse buckets • Transition from insert-driven to update-driven workloads
  • 28. Pre-Aggregated Log Data { timestamp_minute: ISODate("2000-10-10T20:55:00Z"), resource: "/index.html", page_views: { 0: 50, … 59: 250 } } • Leverage time-seriesstyle bucketing • Trackindividual metrics (ex. page views) • Improve performancefor reads/writes • Minimal processingoverhead
  • 29. Hierarchical Aggregation • Analytical approach as opposed to schema approach – Leverage built-inAggregation Framework or MapReduce • Execute multiple tasks sequentially to aggregate at varying levels • Raw events  Hourly  Weekly  Monthly • Rolling approach distributes the aggregation workload
  • 31. Before You Start • What are the application requirements? • Is pre-aggregation useful for your application? • What are your retention and age-out policies? • What are the gotchas? – Pre-create document structure to avoid fragmentation and performance problems – Organize your data for growth – time series data grows fast!
  • 32. Down The Road • Scale-out considerations – Vertical vs. horizontal (with sharding) • Understanding the data – Aggregation – Analytics – Reporting • Deeper data analysis – Patterns – Predictions
  • 33. Scaling Time Series Data in MongoDB • Vertical growth – Larger instances with more CPU and memory – Increased storage capacity • Horizontal growth – Partitioning data across many machines – Dividing and distributing the workload
  • 34. Time Series Sharding Considerations • What are the application requirements? – Primarily collecting data – Primarily reporting data – Both • Map those back to – Write performance needs – Read/write query distribution – Collection organization (see MMS Monitoring) • Example: {metric name, coarse timestamp}
  • 35. Aggregates, Analytics, Reporting • Aggregation Framework can be used for analysis – Does it work with the chosen schema design? – What sorts of aggregations are needed? • Reporting can be done on predictable, rolling basis – See “HierarchicalAggregation” • Consider secondary reads for analytical operations – Minimize load on production primaries
  • 36. Deeper Data Analysis • Leverage MongoDB-Hadoop connector – Bi-directional support for reading/writing – Works with online and offline data (e.g. backup files) • Compute using MapReduce – Patterns – Recommendations – Etc. • Explore data – Pig – Hive
  • 38. Resources • Schema Design for Time Series Data in MongoDB http://blog.mongodb.org/post/65517193370/schema-design-for-time-series- data-in-mongodb • Operational Intelligence Use Case http://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence • Data Modeling in MongoDB http://docs.mongodb.org/manual/data-modeling/ • Schema Design (webinar) http://www.mongodb.com/events/webinar/schema-design-oct2013