SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Making Sense of
your Data
BUILDING A CUSTOM MONGODB DATASOURCE
FOR GRAFANA WITH VERTX
1
About me
 IT Consultant & Java Specialist at DevCon5 (CH)
 Focal Areas
 Tool-assisted quality assurance
 Performance (-testing, -analysis, -tooling)
 Operational Topics (APM, Monitoring)
 Twitter: @gmuecke
2
The Starting Point
 Customer stored and keep response time measurement of test runs
in a MongoDB
 Lots of Data
 Timestamp & Value
 No Proper Visualization
3
What are timeseries data?
 a set of datapoints with a timestamp and a value
time
value
What is MongoDB?
 MongoDB
 NoSQL database with focus on scale
 JSON as data representation
 No HTTP endpoint (TCP based Wire Protocol)
 Aggregation framework for complex queries
 Provides an Async Driver
5
What is Grafana?
 A Service for Visualizing Time Series Data
 Open Source
 Backend written in Go
 Frontend based on Angular
 Dashboards & Alerts
Grafana Architecture 7
Grafana Server
• Implemented in GO
• Persistence for Settings and Dashboards
• Offers Proxy for Service Calls
Browser
Angular UI Data Source Data Source Plugin...
Proxy
DB DB
Datasources for Grafana 8
Grafana Server
• Implemented in GO
• Persistence for Settings and Dashboards
• Offers Proxy for Service Calls
Browser
Datasource
Angular UI
Data Source Plugin
• Angular
• JavaScript
HTTP
Connect Angular Directly to
Mongo?
9
From 2 Tier to 3 Tier 10
Grafana
(Angular) Mongo DB
Grafana
(Angular) Mongo DB
Datasource
Service
HTTP Mongo
Wire
Protocol
Start Simple
SimpleJsonDatasource (Plugin)
3 ServiceEndpoints
 /search  Labels – names of available timeseries
 /annotations  Annotations – textual markers
 /query  Query – actual time series data
11
https://github.com/grafana/simple-json-datasource
/search Format
Request
{
"target" : "select metric",
"refId" : "E"
}
Response
[
"Metric Name 1",
"Metric Name2",
]
An array of strings
12
/annotations Format
Request
{ "annotation" : {
"name" : "Test",
"iconColor" : "rgba(255, 96, 96, 1)",
"datasource" : "Simple Example DS",
"enable" : true,
"query" : "{"name":"Timeseries A"}" },
"range" : {
"from" : "2016-06-13T12:23:47.387Z",
"to" : "2016-06-13T12:24:19.217Z" },
"rangeRaw" : {
"from" : "2016-06-13T12:23:47.387Z",
"to" : "2016-06-13T12:24:19.217Z"
} }
Response
[ { "annotation": {
"name": "Test",
"iconColor": "rgba(255, 96, 96, 1)",
"datasource": "Simple Example DS",
"enable": true,
"query": "{"name":"Timeseries A"}" },
"time": 1465820629774,
"title": "Marker",
"tags": [
"Tag 1",
"Tag 2" ] } ]
13
/query Format
Request
{ "panelId" : 1,
"maxDataPoints" : 1904,
"format" : "json",
"range" : {
"from" : "2016-06-13T12:23:47.387Z",
"to" : "2016-06-13T12:24:19.217Z" },
"rangeRaw" : {
"from" : "2016-06-13T12:23:47.387Z",
"to" : "2016-06-13T12:24:19.217Z" },
"interval" : "20ms",
"targets" : [ {
"target" : "Time series A",
"refId" : "A" },] }
Response
[ { "target":"Timeseries A",
"datapoints":[
[1936,1465820629774],
[2105,1465820632673],
[4187,1465820635570],
[30001,1465820645243] },
{ "target":"Timeseries B",
"datapoints":[ ] }
]
14
Structure of the Source Data
{
"_id" : ObjectId("56375bc54f3c4caedfe68aca"),
"t" : {
"eDesc" : "some description",
"eId" : "56375ae24f3c4caedfe68a07",
"name" : "some name",
"profile" : "I01",
"rnId" : "56375b694f3c4caedfe68aa0",
"rnStatus" : "PASSED",
"uId" : "anonymous"
},
"n" : {
"begin" : NumberLong("1446468494689"),
"value" : NumberLong(283)
}
}
15
Custom Datasource
 Should be
 Lightweight
 Fast / Performant
 Simple
16
Microservice?
 Options for implementation
 Java EE Microservice (i.e. Wildfly Swarm)
 Springboot Microservice
 Vert.x Microservice
 Node.js
 ...
17
The Alternative Options
Node.js
 Single Threaded
 Child Worker Processes
 Javascript Only
 Not best-choice for heavy
computation
Spring / Java EE
 Multithreaded
 Clusterable
 Java Only
 Solid Workhorses, cumbersome at
times
19
Why Vert.x?
 High Performance, Low Footprint
 Asynchronous, Non-Blocking
 Actor-like Concurrency
 Event & Worker Verticles
 Message Driven
 Polyglott
 Java, Groovy, Javascript, Scala …
 Scalable
 Distributed Eventbus
 Multi-threaded Event Loops
20
But first,
some basics
21
Asynchronous non-blocking vs
Synchronous blocking
23
© Fritz Geller-Grimm
© Dontworry
Event Loop 24
Photo: Andreas Praefcke
Event Loop and Verticles 25
Photo: RokerHRO
3rd Floor, Verticle A
2nd Floor, Verticle B
1st Floor, Verticle C
26
27
Event Loop 28
Verticle
Verticle
Verticle
EventI/O
Event Bus 30
Verticle
Verticle
Verticle
Eventbus
Message
CPU
Multi-Reactor 31
Core Core Core Core
Eventbus
Other Vert.x
Instance
Browser
Verticle Verticle
Event & Worker Verticles
Event Driven Verticles Worker Verticles
32
Verticle
Verticle
Verticle
Thread Pool
Thread Pool
Verticle
Verticle
Verticle
Verticle
Verticle
Implementing the datasource
 Http Verticle
 Routing requests & sending responses
 Verticles querying the DB
 Searching timeseries labels
 Annotation
 Timeseries data points
 Optional Verticles for Post Processing
33
What is the challenge?
 Optimization
 Queries can be optimized
 Large datasets have to be searched, read and transported
 Source data can not be modified VS data redundancy
 Sizing
 How to size the analysis system without knowing the query-times?
 How to size thread pools or database pools if most of the queries will
take 100ms – 30s ?
34
CPU
Datasource Architecture 35
HTTP
Service
Eventbus
Timeseries
HTTP
Request
HTTP
Response
DB
Labels Annotations
Step 1 – The naive approach
 Find all datapoints within range
36
CPU
Datasource Architecture 37
HTTP
Service
Eventbus
Query
Database
HTTP
Request
HTTP
Response
DB
Step 2 – Split Request
 Split request into chunks (#chunks = #cores)
 Use multiple Verticle Instance in parallel (#instances = #cores) ?
38
CPU
CPU
Datasource Architecture 39
HTTP
Service
Split/ Merge
Request
Eventbus
Query
Database
Query
Database
Query
Database
Query
Database
HTTP
Request
HTTP
Response
DB
Step 3 – Aggregate Datapoints
 Use Mongo Aggregation Pipeline
 Reduce Datapoints returned to service
40
CPU
Datasource Architecture 41
HTTP
Service
Split/ Merge
Request
Eventbus
Query
Database
Query
Database
Query
Database
Query
Database
HTTP
Request
HTTP
Response
DB
Step 4 – Percentiles (CPU)
 Fetch all data
 Calculate percentiles in service
42
CPU
Step 4 – Percentiles (DB)
 Build aggregation pipeline to calculate percentiles
 Algorithm, see
http://www.dummies.com/education/math/statistics/how-to-
calculate-percentiles-in-statistics/
43
DB
CPU
Datasource Architecture 44
HTTP
Service
Split/ Merge
Request
Eventbus
Query
Database
Query
Database
Query
Database
Query
Database
HTTP
Request
HTTP
Response
DB
Step 5 - Postprocessing
 Apply additional computation on the result from the database
45
CPUCPU
Datasource Architecture (final) 46
HTTP
Service
Split
Request
Eventbus
Query
Database
Query
Database
Query
Database
Query
Database
Merge
Result
HTTP
Request
HTTP
Response
DB
Post
Process
Post
Process
Post
Process
Post
Process
Eventbus
Anyone recognize
this chip?
 CPU of the PS3 (Cell BE)
 Central Bus (EIB)
 1 General Purpose CPU
 Runs the Game (Event) Loop
 8 SPUs
 Special Processing
 Sound
 Physics
 AI
 ...
 230 GFLOPS (vs 103 GFLOPS PS4)
47
Adding more stats & calculation
 Push Calculation to DB if possible
 Add more workers / node for complex (post-) processing
 Aggregate results before post-processing
 DB performance is king
48
Custom vs. timeseries DB
Custom:
 No migration of existing data
 No redundant data storage
 More flexibility
 Better extensibility
 Custom views
 Custom aggregation
 More options
 scalability
 retention
Timeseries DB:
 Better out-of-the-box
 experience / performance
 integration
 functionality
 Built-in retention policies
 Built for scalability
49
Takeaways
 Grafana is powerful tool for visualizing data
 Information becomes consumable through visualization
 Information is essential for decision making
 Vert.x is
 Reactive, Non-Blocking, Asynchronous, Scalable
 Running on JVM
 Polyglott
 Fun
52
Source code on:
https://github.com/gmuecke/grafana-vertx-datasource
Still hungry?
FOR GEEK STUFF
53
Let’s read a large data file
 Datafile is large (> 1GB)
 Every line of the file is a datapoint
 The first 10 characters are a timestamp
 The dataset is sorted
 The datapoints are not equally distributed
 Grafana requires reads ~1900 datapoints per chart request
54
The Challenges (pick one)
 How to randomly access
1900 datapoints without
reading the entire file into
memory?
 How to read a huge file
efficiently into memory?
55
Index
+ Lazy refinement
Index
+ Lazy load
Let’s build an index
 Indexes can be build using a tree-
datastructure
 Node: Timestamp
 Leaf: offset position in file
or the datapoint
 Red-Black Trees provide fast
access
 Read/Insert O(log n)
 Space n
56
© Cburnett, Wikipedia
57
 java.util.TreeMap is a red-black tree
based implementation*
 TreeMap<Long,Long> index = new
TreeMap<>();
 *actually all HashMaps are
58
How to build an index (fast)?
 Read datapoint from offset positions
 Build a partial index
59
Dataset
On next query
 Locate Block
 Refine Block
 Update Index
60
CPUCPU
Datasource Architecture (again) 61
HTTP
Service
Split
Request
Eventbus
Read File
Read File
Read File
Read File
Merge
Result
HTTP
Request
HTTP
Response
Dataset
Post
Process
Post
Process
Post
Process
Post
Process
Eventbus
Tradeoffs
Block
Size
Index
Size
Startup
Time
Heap
Size
Request
Size
62
Thank you!
FEEDBACK APRECIATED!
63

Contenu connexe

Tendances

Ginsbourg.com - Performance and Load Test Report Template LTR 1.2
Ginsbourg.com - Performance and Load Test Report Template LTR 1.2Ginsbourg.com - Performance and Load Test Report Template LTR 1.2
Ginsbourg.com - Performance and Load Test Report Template LTR 1.2Shay Ginsbourg
 
Optimizing Application Performance on Kubernetes
Optimizing Application Performance on KubernetesOptimizing Application Performance on Kubernetes
Optimizing Application Performance on KubernetesDinakar Guniguntala
 
We Are All Testers Now: The Testing Pyramid and Front-End Development
We Are All Testers Now: The Testing Pyramid and Front-End DevelopmentWe Are All Testers Now: The Testing Pyramid and Front-End Development
We Are All Testers Now: The Testing Pyramid and Front-End DevelopmentAll Things Open
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Brian Brazil
 
Velocity 2015 building self healing systems (slide share version)
Velocity 2015 building self healing systems (slide share version)Velocity 2015 building self healing systems (slide share version)
Velocity 2015 building self healing systems (slide share version)SOASTA
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
Oracle real application clusters system tests with demo
Oracle real application clusters system tests with demoOracle real application clusters system tests with demo
Oracle real application clusters system tests with demoAjith Narayanan
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Brian Brazil
 
I know why your Java is slow
I know why your Java is slowI know why your Java is slow
I know why your Java is slowaragozin
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
Jdk Tools For Performance Diagnostics
Jdk Tools For Performance DiagnosticsJdk Tools For Performance Diagnostics
Jdk Tools For Performance DiagnosticsDror Bereznitsky
 
Efficient Memory and Thread Management in Highly Parallel Java Applications
Efficient Memory and Thread Management in Highly Parallel Java ApplicationsEfficient Memory and Thread Management in Highly Parallel Java Applications
Efficient Memory and Thread Management in Highly Parallel Java Applicationspkoza
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuningJerry Kurian
 
Circuit breakers - Using Spring-Boot + Hystrix + Dashboard + Retry
Circuit breakers - Using Spring-Boot + Hystrix + Dashboard + RetryCircuit breakers - Using Spring-Boot + Hystrix + Dashboard + Retry
Circuit breakers - Using Spring-Boot + Hystrix + Dashboard + RetryBruno Henrique Rother
 
Profiler Guided Java Performance Tuning
Profiler Guided Java Performance TuningProfiler Guided Java Performance Tuning
Profiler Guided Java Performance Tuningosa_ora
 

Tendances (20)

Ginsbourg.com - Performance and Load Test Report Template LTR 1.2
Ginsbourg.com - Performance and Load Test Report Template LTR 1.2Ginsbourg.com - Performance and Load Test Report Template LTR 1.2
Ginsbourg.com - Performance and Load Test Report Template LTR 1.2
 
Optimizing Application Performance on Kubernetes
Optimizing Application Performance on KubernetesOptimizing Application Performance on Kubernetes
Optimizing Application Performance on Kubernetes
 
We Are All Testers Now: The Testing Pyramid and Front-End Development
We Are All Testers Now: The Testing Pyramid and Front-End DevelopmentWe Are All Testers Now: The Testing Pyramid and Front-End Development
We Are All Testers Now: The Testing Pyramid and Front-End Development
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
 
Velocity 2015 building self healing systems (slide share version)
Velocity 2015 building self healing systems (slide share version)Velocity 2015 building self healing systems (slide share version)
Velocity 2015 building self healing systems (slide share version)
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
Prometheus course
Prometheus coursePrometheus course
Prometheus course
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Oracle real application clusters system tests with demo
Oracle real application clusters system tests with demoOracle real application clusters system tests with demo
Oracle real application clusters system tests with demo
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)
 
I know why your Java is slow
I know why your Java is slowI know why your Java is slow
I know why your Java is slow
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
Jdk Tools For Performance Diagnostics
Jdk Tools For Performance DiagnosticsJdk Tools For Performance Diagnostics
Jdk Tools For Performance Diagnostics
 
Prometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb SolutionPrometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb Solution
 
Efficient Memory and Thread Management in Highly Parallel Java Applications
Efficient Memory and Thread Management in Highly Parallel Java ApplicationsEfficient Memory and Thread Management in Highly Parallel Java Applications
Efficient Memory and Thread Management in Highly Parallel Java Applications
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
Circuit breakers - Using Spring-Boot + Hystrix + Dashboard + Retry
Circuit breakers - Using Spring-Boot + Hystrix + Dashboard + RetryCircuit breakers - Using Spring-Boot + Hystrix + Dashboard + Retry
Circuit breakers - Using Spring-Boot + Hystrix + Dashboard + Retry
 
Profiler Guided Java Performance Tuning
Profiler Guided Java Performance TuningProfiler Guided Java Performance Tuning
Profiler Guided Java Performance Tuning
 
Resilience with Hystrix
Resilience with HystrixResilience with Hystrix
Resilience with Hystrix
 

Similaire à Making sense of your data jug

Making sense of your data
Making sense of your dataMaking sense of your data
Making sense of your dataGerald Muecke
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Reactive data analysis with vert.x
Reactive data analysis with vert.xReactive data analysis with vert.x
Reactive data analysis with vert.xGerald Muecke
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayGrega Kespret
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongCeph Community
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...StampedeCon
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017Jags Ramnarayan
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & RŁukasz Grala
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Tech Triveni
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 

Similaire à Making sense of your data jug (20)

Making sense of your data
Making sense of your dataMaking sense of your data
Making sense of your data
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Reactive data analysis with vert.x
Reactive data analysis with vert.xReactive data analysis with vert.x
Reactive data analysis with vert.x
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the way
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
MongoDB 3.4 webinar
MongoDB 3.4 webinarMongoDB 3.4 webinar
MongoDB 3.4 webinar
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 

Dernier

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Dernier (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Making sense of your data jug

  • 1. Making Sense of your Data BUILDING A CUSTOM MONGODB DATASOURCE FOR GRAFANA WITH VERTX 1
  • 2. About me  IT Consultant & Java Specialist at DevCon5 (CH)  Focal Areas  Tool-assisted quality assurance  Performance (-testing, -analysis, -tooling)  Operational Topics (APM, Monitoring)  Twitter: @gmuecke 2
  • 3. The Starting Point  Customer stored and keep response time measurement of test runs in a MongoDB  Lots of Data  Timestamp & Value  No Proper Visualization 3
  • 4. What are timeseries data?  a set of datapoints with a timestamp and a value time value
  • 5. What is MongoDB?  MongoDB  NoSQL database with focus on scale  JSON as data representation  No HTTP endpoint (TCP based Wire Protocol)  Aggregation framework for complex queries  Provides an Async Driver 5
  • 6. What is Grafana?  A Service for Visualizing Time Series Data  Open Source  Backend written in Go  Frontend based on Angular  Dashboards & Alerts
  • 7. Grafana Architecture 7 Grafana Server • Implemented in GO • Persistence for Settings and Dashboards • Offers Proxy for Service Calls Browser Angular UI Data Source Data Source Plugin... Proxy DB DB
  • 8. Datasources for Grafana 8 Grafana Server • Implemented in GO • Persistence for Settings and Dashboards • Offers Proxy for Service Calls Browser Datasource Angular UI Data Source Plugin • Angular • JavaScript HTTP
  • 10. From 2 Tier to 3 Tier 10 Grafana (Angular) Mongo DB Grafana (Angular) Mongo DB Datasource Service HTTP Mongo Wire Protocol
  • 11. Start Simple SimpleJsonDatasource (Plugin) 3 ServiceEndpoints  /search  Labels – names of available timeseries  /annotations  Annotations – textual markers  /query  Query – actual time series data 11 https://github.com/grafana/simple-json-datasource
  • 12. /search Format Request { "target" : "select metric", "refId" : "E" } Response [ "Metric Name 1", "Metric Name2", ] An array of strings 12
  • 13. /annotations Format Request { "annotation" : { "name" : "Test", "iconColor" : "rgba(255, 96, 96, 1)", "datasource" : "Simple Example DS", "enable" : true, "query" : "{"name":"Timeseries A"}" }, "range" : { "from" : "2016-06-13T12:23:47.387Z", "to" : "2016-06-13T12:24:19.217Z" }, "rangeRaw" : { "from" : "2016-06-13T12:23:47.387Z", "to" : "2016-06-13T12:24:19.217Z" } } Response [ { "annotation": { "name": "Test", "iconColor": "rgba(255, 96, 96, 1)", "datasource": "Simple Example DS", "enable": true, "query": "{"name":"Timeseries A"}" }, "time": 1465820629774, "title": "Marker", "tags": [ "Tag 1", "Tag 2" ] } ] 13
  • 14. /query Format Request { "panelId" : 1, "maxDataPoints" : 1904, "format" : "json", "range" : { "from" : "2016-06-13T12:23:47.387Z", "to" : "2016-06-13T12:24:19.217Z" }, "rangeRaw" : { "from" : "2016-06-13T12:23:47.387Z", "to" : "2016-06-13T12:24:19.217Z" }, "interval" : "20ms", "targets" : [ { "target" : "Time series A", "refId" : "A" },] } Response [ { "target":"Timeseries A", "datapoints":[ [1936,1465820629774], [2105,1465820632673], [4187,1465820635570], [30001,1465820645243] }, { "target":"Timeseries B", "datapoints":[ ] } ] 14
  • 15. Structure of the Source Data { "_id" : ObjectId("56375bc54f3c4caedfe68aca"), "t" : { "eDesc" : "some description", "eId" : "56375ae24f3c4caedfe68a07", "name" : "some name", "profile" : "I01", "rnId" : "56375b694f3c4caedfe68aa0", "rnStatus" : "PASSED", "uId" : "anonymous" }, "n" : { "begin" : NumberLong("1446468494689"), "value" : NumberLong(283) } } 15
  • 16. Custom Datasource  Should be  Lightweight  Fast / Performant  Simple 16
  • 17. Microservice?  Options for implementation  Java EE Microservice (i.e. Wildfly Swarm)  Springboot Microservice  Vert.x Microservice  Node.js  ... 17
  • 18. The Alternative Options Node.js  Single Threaded  Child Worker Processes  Javascript Only  Not best-choice for heavy computation Spring / Java EE  Multithreaded  Clusterable  Java Only  Solid Workhorses, cumbersome at times 19
  • 19. Why Vert.x?  High Performance, Low Footprint  Asynchronous, Non-Blocking  Actor-like Concurrency  Event & Worker Verticles  Message Driven  Polyglott  Java, Groovy, Javascript, Scala …  Scalable  Distributed Eventbus  Multi-threaded Event Loops 20
  • 21. Asynchronous non-blocking vs Synchronous blocking 23 © Fritz Geller-Grimm © Dontworry
  • 22. Event Loop 24 Photo: Andreas Praefcke
  • 23. Event Loop and Verticles 25 Photo: RokerHRO 3rd Floor, Verticle A 2nd Floor, Verticle B 1st Floor, Verticle C
  • 24. 26
  • 25. 27
  • 28. CPU Multi-Reactor 31 Core Core Core Core Eventbus Other Vert.x Instance Browser Verticle Verticle
  • 29. Event & Worker Verticles Event Driven Verticles Worker Verticles 32 Verticle Verticle Verticle Thread Pool Thread Pool Verticle Verticle Verticle Verticle Verticle
  • 30. Implementing the datasource  Http Verticle  Routing requests & sending responses  Verticles querying the DB  Searching timeseries labels  Annotation  Timeseries data points  Optional Verticles for Post Processing 33
  • 31. What is the challenge?  Optimization  Queries can be optimized  Large datasets have to be searched, read and transported  Source data can not be modified VS data redundancy  Sizing  How to size the analysis system without knowing the query-times?  How to size thread pools or database pools if most of the queries will take 100ms – 30s ? 34
  • 33. Step 1 – The naive approach  Find all datapoints within range 36
  • 35. Step 2 – Split Request  Split request into chunks (#chunks = #cores)  Use multiple Verticle Instance in parallel (#instances = #cores) ? 38 CPU
  • 36. CPU Datasource Architecture 39 HTTP Service Split/ Merge Request Eventbus Query Database Query Database Query Database Query Database HTTP Request HTTP Response DB
  • 37. Step 3 – Aggregate Datapoints  Use Mongo Aggregation Pipeline  Reduce Datapoints returned to service 40
  • 38. CPU Datasource Architecture 41 HTTP Service Split/ Merge Request Eventbus Query Database Query Database Query Database Query Database HTTP Request HTTP Response DB
  • 39. Step 4 – Percentiles (CPU)  Fetch all data  Calculate percentiles in service 42 CPU
  • 40. Step 4 – Percentiles (DB)  Build aggregation pipeline to calculate percentiles  Algorithm, see http://www.dummies.com/education/math/statistics/how-to- calculate-percentiles-in-statistics/ 43 DB
  • 41. CPU Datasource Architecture 44 HTTP Service Split/ Merge Request Eventbus Query Database Query Database Query Database Query Database HTTP Request HTTP Response DB
  • 42. Step 5 - Postprocessing  Apply additional computation on the result from the database 45
  • 43. CPUCPU Datasource Architecture (final) 46 HTTP Service Split Request Eventbus Query Database Query Database Query Database Query Database Merge Result HTTP Request HTTP Response DB Post Process Post Process Post Process Post Process Eventbus
  • 44. Anyone recognize this chip?  CPU of the PS3 (Cell BE)  Central Bus (EIB)  1 General Purpose CPU  Runs the Game (Event) Loop  8 SPUs  Special Processing  Sound  Physics  AI  ...  230 GFLOPS (vs 103 GFLOPS PS4) 47
  • 45. Adding more stats & calculation  Push Calculation to DB if possible  Add more workers / node for complex (post-) processing  Aggregate results before post-processing  DB performance is king 48
  • 46. Custom vs. timeseries DB Custom:  No migration of existing data  No redundant data storage  More flexibility  Better extensibility  Custom views  Custom aggregation  More options  scalability  retention Timeseries DB:  Better out-of-the-box  experience / performance  integration  functionality  Built-in retention policies  Built for scalability 49
  • 47. Takeaways  Grafana is powerful tool for visualizing data  Information becomes consumable through visualization  Information is essential for decision making  Vert.x is  Reactive, Non-Blocking, Asynchronous, Scalable  Running on JVM  Polyglott  Fun 52 Source code on: https://github.com/gmuecke/grafana-vertx-datasource
  • 49. Let’s read a large data file  Datafile is large (> 1GB)  Every line of the file is a datapoint  The first 10 characters are a timestamp  The dataset is sorted  The datapoints are not equally distributed  Grafana requires reads ~1900 datapoints per chart request 54
  • 50. The Challenges (pick one)  How to randomly access 1900 datapoints without reading the entire file into memory?  How to read a huge file efficiently into memory? 55 Index + Lazy refinement Index + Lazy load
  • 51. Let’s build an index  Indexes can be build using a tree- datastructure  Node: Timestamp  Leaf: offset position in file or the datapoint  Red-Black Trees provide fast access  Read/Insert O(log n)  Space n 56 © Cburnett, Wikipedia
  • 52. 57
  • 53.  java.util.TreeMap is a red-black tree based implementation*  TreeMap<Long,Long> index = new TreeMap<>();  *actually all HashMaps are 58
  • 54. How to build an index (fast)?  Read datapoint from offset positions  Build a partial index 59 Dataset
  • 55. On next query  Locate Block  Refine Block  Update Index 60
  • 56. CPUCPU Datasource Architecture (again) 61 HTTP Service Split Request Eventbus Read File Read File Read File Read File Merge Result HTTP Request HTTP Response Dataset Post Process Post Process Post Process Post Process Eventbus