SlideShare a Scribd company logo
1 of 40
Download to read offline
World’s Best Data Modeling Tool
for Apache Cassandra
1© 2015. All Rights Reserved.
Artem ChebotkoAndrey Kashlev
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
2© 2015. All Rights Reserved.
Data Modeling Process
• Data requirements
• Application requirements
• Schema Design
• Optimization
3© 2015. All Rights Reserved.
Cassandra Data Modeling Methodology
© 2015. All Rights Reserved. 4
Conceptual
Data Model
Application
Workflow
Logical
Data Model
Physical
Data Model
Mapping Optimization
Methodology Models
© 2015. All Rights Reserved. 5
Model Representation
Conceptual Data Model ERD
Application Workflow Model Graph
Logical Data Model Chebotko Diagram
Physical Data Model Chebotko Diagram, CQL
Methodology Protocols
© 2015. All Rights Reserved. 6
• Conceptual-to-logical mapping
– Mapping rules
– Mapping patterns
• Physical optimizations
– Partition size analysis
– Duplication factor analysis
– Keys, aggregation, transactions, …
Example
© 2015. All Rights Reserved. 7
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ? AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1
Example
© 2015. All Rights Reserved. 8
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Entity and Relationship Types
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1 2
Example
© 2015. All Rights Reserved. 9
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Equality Search Atributes
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↑
id C↑
value
1 2 3
Example
© 2015. All Rights Reserved. 10
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Inequality Search Attributes
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↑
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1 2 3 4
Example
© 2015. All Rights Reserved. 11
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Ordering Attributes
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↑
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1 2 3 4 5
Example
© 2015. All Rights Reserved. 12
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Key Attributes
Methodology Pros and Cons
Correctness
Completeness
© 2015. All Rights Reserved. 13
Complexity
Time investment
Human Errors Happen …
© 2015. All Rights Reserved. 14
Automation
© 2015. All Rights Reserved. 15
Complexity
Time investment
Human Error
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
16© 2015. All Rights Reserved.
The KDM Tool
• Streamlines the methodology
• Guides the user
• Automates data modeling tasks:
– Conceptual-to-logical mapping
– Physical optimization
– CQL generation
17© 2015. All Rights Reserved.
KDM Automation Workflow
18© 2015. All Rights Reserved.
KDM Automation Workflow
19© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Step1
Solution
architect
KDM Automation Workflow
20© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Solution
architect
Step1 Step2
Solution
architect
KDM Automation Workflow
21© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
KDM
Solution
architect
Step1 Step2 Automated
Solution
architect
KDM Automation Workflow
22© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
KDM
Solution
architect
Step1 Step2 Step3Automated
Solution
architect
Solution
architect
KDM Automation Workflow
23© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
KDM
Solution
architect
Step1 Step2 Step3Automated Automated
Solution
architect
Solution
architect
KDM
KDM Automation Workflow
24© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
Configure
Physical
Data
Model
KDM
Solution
architect
Step1 Step2 Step3 Step4Automated Automated
Solution
architect
Solution
architect
Solution
architect
KDM
KDM Automation Workflow
25© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
Configure
Physical
Data
Model
Generate
Physical
Schema
KDM
Solution
architect
Step1 Step2 Step3 Step4Automated Automated Automated
Solution
architect
Solution
architect
Solution
architect
KDM KDM
KDM Automation Workflow
26© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
Configure
Physical
Data
Model
Generate
Physical
Schema
Download
CQL
Script
KDM
Solution
architect
Step1 Step2 Step3 Step4 Step5Automated Automated Automated
Solution
architect
Solution
architect
Solution
architect
Solution
architect
KDM KDM
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
27© 2015. All Rights Reserved.
28
29
given
find
Q1.
30
Q1.
Q2.
given
given
range
find
find and
sort DESC
given
31
Q1.
Q2.
Q3.
given
find and
sort DESC
given
find
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
32© 2015. All Rights Reserved.
© 2015. All Rights Reserved. 33
34© 2015. All Rights Reserved.
• KDM:
– automates most complex tasks
– eliminates human error
– simplifies data modeling
– guides
Summary
35© 2015. All Rights Reserved.
• build new data models
• verify existing data models
• teach/learn data modeling
How Can KDM Help You?
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
36© 2015. All Rights Reserved.
Future Work
• Materialized views
• User Defined Types
© 2015. All Rights Reserved. 37
Future Work
• Analysis and physical optimization
• Support for application workflow design
• Support for Chebotko Diagrams
© 2015. All Rights Reserved. 38
Acknowledgements
• Andrey Kashlev would like to thank:
– Dr. Shiyong Lu
– Anthony Piazza
• Artem Chebotko would like to thank:
– Anthony Piazza
– Patrick McFadin
– Jonathan Ellis
– Tim Berglund
© 2015. All Rights Reserved. 39
Thank you

More Related Content

What's hot

Titan and Cassandra at WellAware
Titan and Cassandra at WellAwareTitan and Cassandra at WellAware
Titan and Cassandra at WellAwaretwilmes
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Spark Summit
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiDatabricks
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandrajohnrjenson
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit
 
Introduction to TitanDB
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB Knoldus Inc.
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analyticsSigmoid
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...Databricks
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQLjeykottalam
 
Write Graph Algorithms Like a Boss Andrew Ray
Write Graph Algorithms Like a Boss Andrew RayWrite Graph Algorithms Like a Boss Andrew Ray
Write Graph Algorithms Like a Boss Andrew RayDatabricks
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Databricks
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityCurtis Mosters
 
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...Spark Summit
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Databricks
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Spark Summit
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming JobsDatabricks
 

What's hot (20)

Titan and Cassandra at WellAware
Titan and Cassandra at WellAwareTitan and Cassandra at WellAware
Titan and Cassandra at WellAware
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Introduction to TitanDB
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Write Graph Algorithms Like a Boss Andrew Ray
Write Graph Algorithms Like a Boss Andrew RayWrite Graph Algorithms Like a Boss Andrew Ray
Write Graph Algorithms Like a Boss Andrew Ray
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionality
 
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 

Viewers also liked

Overiew of Cassandra and Doradus
Overiew of Cassandra and DoradusOveriew of Cassandra and Doradus
Overiew of Cassandra and Doradusrandyguck
 
Extending Cassandra with Doradus OLAP for High Performance Analytics
Extending Cassandra with Doradus OLAP for High Performance AnalyticsExtending Cassandra with Doradus OLAP for High Performance Analytics
Extending Cassandra with Doradus OLAP for High Performance Analyticsrandyguck
 
Cassandra Day Chicago 2015: Advanced Data Modeling
Cassandra Day Chicago 2015: Advanced Data ModelingCassandra Day Chicago 2015: Advanced Data Modeling
Cassandra Day Chicago 2015: Advanced Data ModelingDataStax Academy
 
Cassandra - how to fail?
Cassandra - how to fail?Cassandra - how to fail?
Cassandra - how to fail?SoftwareMill
 
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...DataStax
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data modelDuyhai Doan
 

Viewers also liked (6)

Overiew of Cassandra and Doradus
Overiew of Cassandra and DoradusOveriew of Cassandra and Doradus
Overiew of Cassandra and Doradus
 
Extending Cassandra with Doradus OLAP for High Performance Analytics
Extending Cassandra with Doradus OLAP for High Performance AnalyticsExtending Cassandra with Doradus OLAP for High Performance Analytics
Extending Cassandra with Doradus OLAP for High Performance Analytics
 
Cassandra Day Chicago 2015: Advanced Data Modeling
Cassandra Day Chicago 2015: Advanced Data ModelingCassandra Day Chicago 2015: Advanced Data Modeling
Cassandra Day Chicago 2015: Advanced Data Modeling
 
Cassandra - how to fail?
Cassandra - how to fail?Cassandra - how to fail?
Cassandra - how to fail?
 
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
 

Similar to World’s Best Data Modeling Tool

Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...DataStax Academy
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Chun-Yu Tseng
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at ScaleDatabricks
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsJohann Schleier-Smith
 
Bring your Graphite-compatible metrics into Sumo Logic
Bring your Graphite-compatible metrics into Sumo LogicBring your Graphite-compatible metrics into Sumo Logic
Bring your Graphite-compatible metrics into Sumo LogicSumo Logic
 
Redgate Community Circle: Tools For SQL Server Performance Tuning
Redgate Community Circle: Tools For SQL Server Performance TuningRedgate Community Circle: Tools For SQL Server Performance Tuning
Redgate Community Circle: Tools For SQL Server Performance TuningGrant Fritchey
 
Prometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint CheckettsPrometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint Checkettsclintchecketts
 
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan PanayotovSpark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan PanayotovDatabricks
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
AppliFire Blue Print Design Guidelines
AppliFire Blue Print Design GuidelinesAppliFire Blue Print Design Guidelines
AppliFire Blue Print Design GuidelinesAppliFire Platform
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery ToolsAntonio Rolle
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely
 
Six D India Reverse Engineering, 3D Inspection & 3D Laser Scanning
Six D India Reverse Engineering, 3D Inspection & 3D Laser ScanningSix D India Reverse Engineering, 3D Inspection & 3D Laser Scanning
Six D India Reverse Engineering, 3D Inspection & 3D Laser ScanningAkash Rana
 
Six d india reverse engineering & 3 d inspection
Six d india reverse engineering & 3 d inspectionSix d india reverse engineering & 3 d inspection
Six d india reverse engineering & 3 d inspectionAkash Rana
 
Six D India - Reverse Engineering, 3D Inspection, 3D Laser Scanning
Six D India - Reverse Engineering, 3D Inspection, 3D Laser ScanningSix D India - Reverse Engineering, 3D Inspection, 3D Laser Scanning
Six D India - Reverse Engineering, 3D Inspection, 3D Laser ScanningAkash Rana
 
Ai lifecycle and navigator
Ai lifecycle and navigatorAi lifecycle and navigator
Ai lifecycle and navigatoraiclub_slides
 
QualityBPM@Heidelberg Innovation Forum 2014
QualityBPM@Heidelberg Innovation Forum 2014QualityBPM@Heidelberg Innovation Forum 2014
QualityBPM@Heidelberg Innovation Forum 2014Tobias Unger
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software EngineeringMiroslaw Staron
 

Similar to World’s Best Data Modeling Tool (20)

Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at Scale
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
Bring your Graphite-compatible metrics into Sumo Logic
Bring your Graphite-compatible metrics into Sumo LogicBring your Graphite-compatible metrics into Sumo Logic
Bring your Graphite-compatible metrics into Sumo Logic
 
Redgate Community Circle: Tools For SQL Server Performance Tuning
Redgate Community Circle: Tools For SQL Server Performance TuningRedgate Community Circle: Tools For SQL Server Performance Tuning
Redgate Community Circle: Tools For SQL Server Performance Tuning
 
Prometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint CheckettsPrometheus - Utah Software Architecture Meetup - Clint Checketts
Prometheus - Utah Software Architecture Meetup - Clint Checketts
 
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan PanayotovSpark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
AppliFire Blue Print Design Guidelines
AppliFire Blue Print Design GuidelinesAppliFire Blue Print Design Guidelines
AppliFire Blue Print Design Guidelines
 
Siddhi CEP 1st presentation
Siddhi CEP 1st presentationSiddhi CEP 1st presentation
Siddhi CEP 1st presentation
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature Delivery
 
Six D India Reverse Engineering, 3D Inspection & 3D Laser Scanning
Six D India Reverse Engineering, 3D Inspection & 3D Laser ScanningSix D India Reverse Engineering, 3D Inspection & 3D Laser Scanning
Six D India Reverse Engineering, 3D Inspection & 3D Laser Scanning
 
Six d india reverse engineering & 3 d inspection
Six d india reverse engineering & 3 d inspectionSix d india reverse engineering & 3 d inspection
Six d india reverse engineering & 3 d inspection
 
Six D India - Reverse Engineering, 3D Inspection, 3D Laser Scanning
Six D India - Reverse Engineering, 3D Inspection, 3D Laser ScanningSix D India - Reverse Engineering, 3D Inspection, 3D Laser Scanning
Six D India - Reverse Engineering, 3D Inspection, 3D Laser Scanning
 
Ai lifecycle and navigator
Ai lifecycle and navigatorAi lifecycle and navigator
Ai lifecycle and navigator
 
QualityBPM@Heidelberg Innovation Forum 2014
QualityBPM@Heidelberg Innovation Forum 2014QualityBPM@Heidelberg Innovation Forum 2014
QualityBPM@Heidelberg Innovation Forum 2014
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 

World’s Best Data Modeling Tool

  • 1. World’s Best Data Modeling Tool for Apache Cassandra 1© 2015. All Rights Reserved. Artem ChebotkoAndrey Kashlev
  • 2. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 2© 2015. All Rights Reserved.
  • 3. Data Modeling Process • Data requirements • Application requirements • Schema Design • Optimization 3© 2015. All Rights Reserved.
  • 4. Cassandra Data Modeling Methodology © 2015. All Rights Reserved. 4 Conceptual Data Model Application Workflow Logical Data Model Physical Data Model Mapping Optimization
  • 5. Methodology Models © 2015. All Rights Reserved. 5 Model Representation Conceptual Data Model ERD Application Workflow Model Graph Logical Data Model Chebotko Diagram Physical Data Model Chebotko Diagram, CQL
  • 6. Methodology Protocols © 2015. All Rights Reserved. 6 • Conceptual-to-logical mapping – Mapping rules – Mapping patterns • Physical optimizations – Partition size analysis – Duplication factor analysis – Keys, aggregation, transactions, …
  • 7. Example © 2015. All Rights Reserved. 7 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords
  • 8. sensor_data location K parameter K timestamp C↓ id C↑ value 1 Example © 2015. All Rights Reserved. 8 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Entity and Relationship Types
  • 9. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value 1 2 Example © 2015. All Rights Reserved. 9 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Equality Search Atributes
  • 10. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↑ id C↑ value 1 2 3 Example © 2015. All Rights Reserved. 10 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Inequality Search Attributes
  • 11. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↑ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value 1 2 3 4 Example © 2015. All Rights Reserved. 11 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Ordering Attributes
  • 12. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↑ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value 1 2 3 4 5 Example © 2015. All Rights Reserved. 12 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Key Attributes
  • 13. Methodology Pros and Cons Correctness Completeness © 2015. All Rights Reserved. 13 Complexity Time investment
  • 14. Human Errors Happen … © 2015. All Rights Reserved. 14
  • 15. Automation © 2015. All Rights Reserved. 15 Complexity Time investment Human Error
  • 16. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 16© 2015. All Rights Reserved.
  • 17. The KDM Tool • Streamlines the methodology • Guides the user • Automates data modeling tasks: – Conceptual-to-logical mapping – Physical optimization – CQL generation 17© 2015. All Rights Reserved.
  • 18. KDM Automation Workflow 18© 2015. All Rights Reserved.
  • 19. KDM Automation Workflow 19© 2015. All Rights Reserved. Design Conceptual Data Model Step1 Solution architect
  • 20. KDM Automation Workflow 20© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Solution architect Step1 Step2 Solution architect
  • 21. KDM Automation Workflow 21© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models KDM Solution architect Step1 Step2 Automated Solution architect
  • 22. KDM Automation Workflow 22© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model KDM Solution architect Step1 Step2 Step3Automated Solution architect Solution architect
  • 23. KDM Automation Workflow 23© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model KDM Solution architect Step1 Step2 Step3Automated Automated Solution architect Solution architect KDM
  • 24. KDM Automation Workflow 24© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model Configure Physical Data Model KDM Solution architect Step1 Step2 Step3 Step4Automated Automated Solution architect Solution architect Solution architect KDM
  • 25. KDM Automation Workflow 25© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model Configure Physical Data Model Generate Physical Schema KDM Solution architect Step1 Step2 Step3 Step4Automated Automated Automated Solution architect Solution architect Solution architect KDM KDM
  • 26. KDM Automation Workflow 26© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model Configure Physical Data Model Generate Physical Schema Download CQL Script KDM Solution architect Step1 Step2 Step3 Step4 Step5Automated Automated Automated Solution architect Solution architect Solution architect Solution architect KDM KDM
  • 27. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 27© 2015. All Rights Reserved.
  • 28. 28
  • 32. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 32© 2015. All Rights Reserved.
  • 33. © 2015. All Rights Reserved. 33
  • 34. 34© 2015. All Rights Reserved. • KDM: – automates most complex tasks – eliminates human error – simplifies data modeling – guides Summary
  • 35. 35© 2015. All Rights Reserved. • build new data models • verify existing data models • teach/learn data modeling How Can KDM Help You?
  • 36. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 36© 2015. All Rights Reserved.
  • 37. Future Work • Materialized views • User Defined Types © 2015. All Rights Reserved. 37
  • 38. Future Work • Analysis and physical optimization • Support for application workflow design • Support for Chebotko Diagrams © 2015. All Rights Reserved. 38
  • 39. Acknowledgements • Andrey Kashlev would like to thank: – Dr. Shiyong Lu – Anthony Piazza • Artem Chebotko would like to thank: – Anthony Piazza – Patrick McFadin – Jonathan Ellis – Tim Berglund © 2015. All Rights Reserved. 39