SlideShare a Scribd company logo
1 of 27
Jim Peregord, Venu Palvai
Element Fleet Management
Building a Pluggable Analytics Stack with Cassandra as the Foundation
1 Background on Element Fleet Management
2 Key Use Cases Supported
3 Architecture
4 Our Journey
5 Lessons Learned
2© DataStax, All Rights Reserved.
A Little About Us
© DataStax, All Rights Reserved. 3
Jim Peregord Venu Palvai
VP – Analytics, BI, Data Mgt
jperegord@elementcorp.com
Lead Architect
vpalvai@elementcorp.com
Background on Element Fleet Management
© DataStax, All Rights Reserved. 4
Full lifecycle of fleet
management services
Data consolidation and
advanced analytics
services
Maximize customer
ROI on fleet assets via
data and advanced
analytics
2,600 employees
1+ million vehicles
managed
$18 billion in total
finance assets
2 billion rows of data
and growing
Greenfield Opportunity to Build Analytics Platform
• Element acquired GE Fleet Management September 1, 2015
• Now the largest publicly held Fleet Management company in world
• Pre-acquisition Element had limited data warehouse and Big Data tech
• Greenfield Opportunity to build next gen BI and Advanced Analytics platform
High-level Options Considered
#1 – Build a separate data warehouse and Big Data/Advanced Analytics platform
#2 – Build a single, unified architecture that supports both
Our Decision
#2 – Build a single, unified platform using DataStax
© DataStax, All Rights Reserved. 5
Key Use Cases Supported on New Platform
• High availability out of the box
• Linear and elastic scalability
• High concurrency and low latency
• Real-time ingestion of data streams: Vehicle (location, diagnostics), weather, traffic
• Expose data and analytics via RESTful APIs
• Advanced Analytics (Predictive, Prescriptive, Streaming)
• Data warehouse and traditional reporting
© DataStax, All Rights Reserved. 6
Architecture
Advanced Analytics Hardware Architecture
• Purpose-Built Hardware for Advanced Analytics
• NUMA/NVME Hardware is not commodity – it is highly specialized for very
high performance. Tens of millions of IOPs.
• Architected to scale 10x or even 100x current capacity – A must for
Telematics and IOT data.
• H/W Specs – 256GB, 4 X 2 TB SSD, dedicated C*/Spark instance per SSD
• Active-Active clustering means very high availability
• C* / Spark / SOLR / FiloDB / DSE Graph + NUMA – High performance
analytics platform
© DataStax, All Rights Reserved. 8
Cassandra + Spark
32 nodes
Cassandra + SOLR
8 nodes
Analytics Logical Architecture
© DataStax, All Rights Reserved. 9
Events
Streaming
Sources
Amazon
SQS
Kafka
filoDB
Internal
Batch
Sources
External
Thrift Server
Spark SQL
Job Server
RESTful
Packages
(PySpark)
MLlib
Consumers
Pluggable Architecture - Overview
© DataStax, All Rights Reserved. 10
Element’s pluggable Analytics stack gives us the ability to plug into multiple analytics tools
and choose the right tool depending on the questions we are asking. This gives us the
ability to add new analytics capabilities on top of Cassandra as they become available.
FiloDB
Columnar Data,
Fast Reads
Spark
SQL, Streaming
Analytics,
pySpark
Lucene
Search, Custom
Dictionaries
DSE Graph
Graph-based
Analytics
Future Tools
TBD
Pluggable Architecture - FiloDB
• FiloDB uses Cassandra for storage and Spark for computation
• Optimized for:
• Low latency queries and streaming
• Interactive ad-hoc analysis on Big Data
• Complex analytics and machine learning
• Efficient Columnar Storage (20-40X less storage)
• All queries are distributed and run in parallel in Spark
• Integrates with existing BI tools via JDBC/ODBC
• Horizontally scalable, fault tolerant
• Future enhancements include Geo Spatial Analysis
© DataStax, All Rights Reserved. 11
FiloDB
Columnar
Data,
Fast Reads
Recent blog post by Evan Chan, renowned C* / Spark Expert
www.planetcassandra.org/blog/achieving-sub-second-sql-joins-and-
building-a-data-warehouse-using-spark-cassandra-and-filodb
Pluggable Architecture – Apache Spark
© DataStax, All Rights Reserved. 12
Spark
SQL, Streaming
Analytics
Spark
SQL
• In-memory, fast SQL
processing
• Easily blend data from
multiple sources
• Connect to BI tools
Spark
Streaming
• Ingest streaming data
sources like
telematics, weather,
engine diagnostics,
etc.
Spark
MLlib
• Library of machine
learning algorithms for
advanced analytics
Pluggable Architecture – Lucene / SOLR
• Powerful search algorithms
• Geospatial indexing and geo-queries
• Custom dictionaries
• Efficient metric calculations
© DataStax, All Rights Reserved. 13
Lucene
Search, Custom
Dictionaries
Pluggable Architecture – DSE Enterprise Graph
• Graph databases store data as a network of relationships
• Provides optimized analytics for any data where relationships are most important
• Can improve query/analytics performance 1000X
Example use cases:
• IOT time series on streaming data
• Vehicle routing
• Visualize clusters of well/under performing assets
• Recommend optimal actions
• Fraud detection
© DataStax, All Rights Reserved. 14
DSE Graph
Graph Data
Analytics
Pluggable Architecture – Cassandra
• High performance NoSQL database
• Flexible schema allows new data attributes to be easily added
• Peer-to-Peer, distributed architecture results in no single point of failure – different than traditional
databases
• Elastic scalability to add more servers as workload increases
© DataStax, All Rights Reserved. 15
What our Platform Means to Customers
© DataStax, All Rights Reserved. 16
INFRASTRUCTURE
IMPROVEMENTS
• 20x CPU Speed
• 10x Memory
• 70x Disk Performance
ALL RUNNING ON
Cassandra database
framework has been
adopted by companies
running some of the world’s
largest and most
sophisticated real-time
analytics
Data Insights Action
• Maintenance history
• Fuel purchases
• Miles driven
• GPS location
• Points of Interest
• Weather
• Traffic
• Online repair reviews
• Fuel price geo-indexing
• Predict Operating Costs
• Fraud Detection
• Business Rule Exceptions
• Accident Predictors
• Optimal Replacement
• High risk DTC codes
• Repair sentiment analysis
• Vehicle Replacement
Schedule
• Fraud actions
• Safe driving interventions
• Non-standard
maintenance schedule
• Recommend fueling and
maintenance facilities
Sifting through the data “noise” must be as fast as possible
in order to create actionable recommendations
Our Journey
Journey to Build a Unified BI and Analytics Platform
• Creating flexible data models that work for both BI and analytics
• Achieving high concurrency and low latency required for enterprise reporting platforms
• Optimizing software installation and configuration for performance
• Workload management
© DataStax, All Rights Reserved. 18
Dimensional Modeling for BI and Analytics
• BI Tools are designed to work with dimensional models
• Dimensional models are proven and easy to understand
and model
• Dimensional models are flexible, can answer many
questions
• OLAP use cases require slicing and dicing data across
multiple dimensions
• JOIN capability is critical for achieving data models that
can answer various questions
© DataStax, All Rights Reserved. 19
Fact
Dim Dim
Dim Dim
Limitations of Spark SQL
• Cassandra + Spark cluster provides JOIN functionality
• Spark SQL is not able to pass filters applied on one table
to another table if both tables are joined on filtering
columns.
• Predicate pushdowns are not working for Outer JOIN
relationship
• Pushing predicates to Cassandra/Data source guarantees
better performance
© DataStax, All Rights Reserved. 20
Sample DAG plan for a JOIN SQL
with 5 tables
SQL Example:
Select c.customer_id, c.customer_name , i.invoice_amount
From customer c,invoice i
Where c.customer_id = i.customer_id
And c.customer_id = 123;
Spark splits above SQL into
Select c.customer_id, c.customer_name from customer c
Where c.customer_id = 123;
Select i.customer_id, invoice_amount
From invoice i;
Custom Thrift Server to Optimize SQL Statements
• Adds predicates to joining tables based on matching join columns
• Converts IN conditions to = conditions whenever IN List has only one value
• Adds IN predicate on partition column based on the range predicates supplied on non-partition key
columns
© DataStax, All Rights Reserved. 21
Example
Select c.customer_id, c.customer_name , i.invoice_amount
From customer c,
invoice i
Where c.customer_id = i.customer_id
And c.customer_id IN (123)
Select c.customer_id, c.customer_name , i.invoice_amount
From customer c,
invoice i
Where c.customer_id = i.customer_id
And c.customer_id = 123
And i.customer_id = 123
Custom
Thrift Server
Spark thrift
server with
Custom Hive
Context
Inspect
Logical Plan
Modify
Logical Plan
(if needed)
Submit plan
for
Execution
FiloDB
• Cassandra 2.1 has several restrictions on predicate pushdowns
• FiloDB is a true columnar store
• Provides ~20 – 30 times compression over Cassandra
• Very efficient for single and multiple partition scans
• Partial Predicate Pushdown support
• Provides ~20 - 30 times better read performance over straight
Cassandra
© DataStax, All Rights Reserved. 22
Ck1
Ck2
Rows of data
Get converted to compressed
columnar chunks
Cassandra Storage of FiloDB
data
Dimensional Data Modeling for Cassandra + Spark
• Simple STAR schema models as much as possible (eliminate
snow flakes, outer joins etc)
• De-normalized dimensions, facts (avoid duplicating dimensions
into facts)
• Minimize number of tables involved in joins
• Common partitioning strategy across dimensions and facts (easy
predicate handling)
• Limiting max partition sizes to ~1 GB
• Reduce number of partitions for efficient Spark execution, limit
partition sizes for efficient Cassandra read operations
© DataStax, All Rights Reserved. 23
SPIN
ODS ADS FILO
DB
CASSANDRA/SPARK
JDBC
SPARK
C* C*
ETL - TALEND
RELOAD INCREMENTAL INCREMENTAL
THRIFT
ODS is truncate/load daily.
ADS is complete replica of the source system. Incremental ETL strategy.
ODS tables are used to load FiloDB table (incremental) using Spark Jobs.
SSRS
Power BI
Example: ETL Incremental Load Strategy
Results & Opportunities
• Successfully completed 300 concurrent user load test from Business Objects
• <1 second response from thrift servers for 90% of queries
• Average of 50 columns & 50 - 500k rows returned
• Single partition and multi-partition scans, Joins involving 5-10 FiloDB tables per each query
Opportunities
• Limitations on the maximum result size that can be collected using Spark SQL
• Limitations on the total concurrent result size requested from Spark thrift server
• These are tunable limitations
© DataStax, All Rights Reserved. 25
Lessons Learned
• Limitations of Cassandra for Fast Analytics, may require custom development
• Have a strategy to handle growth of Cassandra partitions
• Throttle read & write work loads for the size of the cluster
• Tombstone management
• Pick right ETL tool for the job.
• Turn off NUMAD service
• Lack of monitoring tools on Spark
• Spark’s lazy evaluation, makes debugging very difficult
© DataStax, All Rights Reserved. 26
Questions?

More Related Content

What's hot

C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...DataStax
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/KuduChris George
 
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...DataStax
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...DataStax
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLliuknag
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016DataStax
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkEvan Chan
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...Yahoo Developer Network
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyondMatija Gobec
 

What's hot (20)

C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 

Viewers also liked

Zimbra propulsé par le n°1 de l'hébergement critique
Zimbra propulsé par le n°1 de l'hébergement critiqueZimbra propulsé par le n°1 de l'hébergement critique
Zimbra propulsé par le n°1 de l'hébergement critiqueCloud Temple
 
Oracle Code Keynote with Thomas Kurian
Oracle Code Keynote with Thomas KurianOracle Code Keynote with Thomas Kurian
Oracle Code Keynote with Thomas KurianOracle Developers
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...DataStax
 
Infinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container EnvironmentsInfinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container EnvironmentsDocker, Inc.
 
Building Modern Applications Using APIs, Microservices and Chatbots
Building Modern Applications Using APIs, Microservices and ChatbotsBuilding Modern Applications Using APIs, Microservices and Chatbots
Building Modern Applications Using APIs, Microservices and ChatbotsOracle Developers
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkDatabricks
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016DataStax
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataSteven Francia
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in GoSteven Francia
 
Implementación de un sistema 3D de información de servicios en el subsuelo en...
Implementación de un sistema 3D de información de servicios en el subsuelo en...Implementación de un sistema 3D de información de servicios en el subsuelo en...
Implementación de un sistema 3D de información de servicios en el subsuelo en...Carles Colás
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 
An introduction to the MicroProfile
An introduction to the MicroProfileAn introduction to the MicroProfile
An introduction to the MicroProfileAlex Soto
 
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse YoungC* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse YoungDataStax Academy
 
NoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learnedNoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learnedLa FeWeb
 
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + ElkVasil Remeniuk
 
Cloud operating system
Cloud operating systemCloud operating system
Cloud operating systemsadak pramodh
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)Steven Francia
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraDataStax Academy
 

Viewers also liked (20)

Zimbra propulsé par le n°1 de l'hébergement critique
Zimbra propulsé par le n°1 de l'hébergement critiqueZimbra propulsé par le n°1 de l'hébergement critique
Zimbra propulsé par le n°1 de l'hébergement critique
 
Oracle Code Keynote with Thomas Kurian
Oracle Code Keynote with Thomas KurianOracle Code Keynote with Thomas Kurian
Oracle Code Keynote with Thomas Kurian
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
 
Infinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container EnvironmentsInfinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container Environments
 
Building Modern Applications Using APIs, Microservices and Chatbots
Building Modern Applications Using APIs, Microservices and ChatbotsBuilding Modern Applications Using APIs, Microservices and Chatbots
Building Modern Applications Using APIs, Microservices and Chatbots
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
Implementación de un sistema 3D de información de servicios en el subsuelo en...
Implementación de un sistema 3D de información de servicios en el subsuelo en...Implementación de un sistema 3D de información de servicios en el subsuelo en...
Implementación de un sistema 3D de información de servicios en el subsuelo en...
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 
An introduction to the MicroProfile
An introduction to the MicroProfileAn introduction to the MicroProfile
An introduction to the MicroProfile
 
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse YoungC* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
 
NoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learnedNoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learned
 
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + Elk
 
Cloud operating system
Cloud operating systemCloud operating system
Cloud operating system
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
 

Similar to Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Corp.) | C* Summit 2016

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Clustrix
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Web Services
 
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...Thomas W. Fry
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Key Database Criteria for Cloud Applications
Key Database Criteria for Cloud ApplicationsKey Database Criteria for Cloud Applications
Key Database Criteria for Cloud ApplicationsNuoDB
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaMarketingArrowECS_CZ
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...DataStax
 

Similar to Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Corp.) | C* Summit 2016 (20)

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Key Database Criteria for Cloud Applications
Key Database Criteria for Cloud ApplicationsKey Database Criteria for Cloud Applications
Key Database Criteria for Cloud Applications
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
Novinky v Oracle Database 18c
Novinky v Oracle Database 18cNovinky v Oracle Database 18c
Novinky v Oracle Database 18c
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
 

More from DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyAnusha Are
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456KiaraTiradoMicha
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 

Recently uploaded (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 

Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Corp.) | C* Summit 2016

  • 1. Jim Peregord, Venu Palvai Element Fleet Management Building a Pluggable Analytics Stack with Cassandra as the Foundation
  • 2. 1 Background on Element Fleet Management 2 Key Use Cases Supported 3 Architecture 4 Our Journey 5 Lessons Learned 2© DataStax, All Rights Reserved.
  • 3. A Little About Us © DataStax, All Rights Reserved. 3 Jim Peregord Venu Palvai VP – Analytics, BI, Data Mgt jperegord@elementcorp.com Lead Architect vpalvai@elementcorp.com
  • 4. Background on Element Fleet Management © DataStax, All Rights Reserved. 4 Full lifecycle of fleet management services Data consolidation and advanced analytics services Maximize customer ROI on fleet assets via data and advanced analytics 2,600 employees 1+ million vehicles managed $18 billion in total finance assets 2 billion rows of data and growing
  • 5. Greenfield Opportunity to Build Analytics Platform • Element acquired GE Fleet Management September 1, 2015 • Now the largest publicly held Fleet Management company in world • Pre-acquisition Element had limited data warehouse and Big Data tech • Greenfield Opportunity to build next gen BI and Advanced Analytics platform High-level Options Considered #1 – Build a separate data warehouse and Big Data/Advanced Analytics platform #2 – Build a single, unified architecture that supports both Our Decision #2 – Build a single, unified platform using DataStax © DataStax, All Rights Reserved. 5
  • 6. Key Use Cases Supported on New Platform • High availability out of the box • Linear and elastic scalability • High concurrency and low latency • Real-time ingestion of data streams: Vehicle (location, diagnostics), weather, traffic • Expose data and analytics via RESTful APIs • Advanced Analytics (Predictive, Prescriptive, Streaming) • Data warehouse and traditional reporting © DataStax, All Rights Reserved. 6
  • 8. Advanced Analytics Hardware Architecture • Purpose-Built Hardware for Advanced Analytics • NUMA/NVME Hardware is not commodity – it is highly specialized for very high performance. Tens of millions of IOPs. • Architected to scale 10x or even 100x current capacity – A must for Telematics and IOT data. • H/W Specs – 256GB, 4 X 2 TB SSD, dedicated C*/Spark instance per SSD • Active-Active clustering means very high availability • C* / Spark / SOLR / FiloDB / DSE Graph + NUMA – High performance analytics platform © DataStax, All Rights Reserved. 8 Cassandra + Spark 32 nodes Cassandra + SOLR 8 nodes
  • 9. Analytics Logical Architecture © DataStax, All Rights Reserved. 9 Events Streaming Sources Amazon SQS Kafka filoDB Internal Batch Sources External Thrift Server Spark SQL Job Server RESTful Packages (PySpark) MLlib Consumers
  • 10. Pluggable Architecture - Overview © DataStax, All Rights Reserved. 10 Element’s pluggable Analytics stack gives us the ability to plug into multiple analytics tools and choose the right tool depending on the questions we are asking. This gives us the ability to add new analytics capabilities on top of Cassandra as they become available. FiloDB Columnar Data, Fast Reads Spark SQL, Streaming Analytics, pySpark Lucene Search, Custom Dictionaries DSE Graph Graph-based Analytics Future Tools TBD
  • 11. Pluggable Architecture - FiloDB • FiloDB uses Cassandra for storage and Spark for computation • Optimized for: • Low latency queries and streaming • Interactive ad-hoc analysis on Big Data • Complex analytics and machine learning • Efficient Columnar Storage (20-40X less storage) • All queries are distributed and run in parallel in Spark • Integrates with existing BI tools via JDBC/ODBC • Horizontally scalable, fault tolerant • Future enhancements include Geo Spatial Analysis © DataStax, All Rights Reserved. 11 FiloDB Columnar Data, Fast Reads Recent blog post by Evan Chan, renowned C* / Spark Expert www.planetcassandra.org/blog/achieving-sub-second-sql-joins-and- building-a-data-warehouse-using-spark-cassandra-and-filodb
  • 12. Pluggable Architecture – Apache Spark © DataStax, All Rights Reserved. 12 Spark SQL, Streaming Analytics Spark SQL • In-memory, fast SQL processing • Easily blend data from multiple sources • Connect to BI tools Spark Streaming • Ingest streaming data sources like telematics, weather, engine diagnostics, etc. Spark MLlib • Library of machine learning algorithms for advanced analytics
  • 13. Pluggable Architecture – Lucene / SOLR • Powerful search algorithms • Geospatial indexing and geo-queries • Custom dictionaries • Efficient metric calculations © DataStax, All Rights Reserved. 13 Lucene Search, Custom Dictionaries
  • 14. Pluggable Architecture – DSE Enterprise Graph • Graph databases store data as a network of relationships • Provides optimized analytics for any data where relationships are most important • Can improve query/analytics performance 1000X Example use cases: • IOT time series on streaming data • Vehicle routing • Visualize clusters of well/under performing assets • Recommend optimal actions • Fraud detection © DataStax, All Rights Reserved. 14 DSE Graph Graph Data Analytics
  • 15. Pluggable Architecture – Cassandra • High performance NoSQL database • Flexible schema allows new data attributes to be easily added • Peer-to-Peer, distributed architecture results in no single point of failure – different than traditional databases • Elastic scalability to add more servers as workload increases © DataStax, All Rights Reserved. 15
  • 16. What our Platform Means to Customers © DataStax, All Rights Reserved. 16 INFRASTRUCTURE IMPROVEMENTS • 20x CPU Speed • 10x Memory • 70x Disk Performance ALL RUNNING ON Cassandra database framework has been adopted by companies running some of the world’s largest and most sophisticated real-time analytics Data Insights Action • Maintenance history • Fuel purchases • Miles driven • GPS location • Points of Interest • Weather • Traffic • Online repair reviews • Fuel price geo-indexing • Predict Operating Costs • Fraud Detection • Business Rule Exceptions • Accident Predictors • Optimal Replacement • High risk DTC codes • Repair sentiment analysis • Vehicle Replacement Schedule • Fraud actions • Safe driving interventions • Non-standard maintenance schedule • Recommend fueling and maintenance facilities Sifting through the data “noise” must be as fast as possible in order to create actionable recommendations
  • 18. Journey to Build a Unified BI and Analytics Platform • Creating flexible data models that work for both BI and analytics • Achieving high concurrency and low latency required for enterprise reporting platforms • Optimizing software installation and configuration for performance • Workload management © DataStax, All Rights Reserved. 18
  • 19. Dimensional Modeling for BI and Analytics • BI Tools are designed to work with dimensional models • Dimensional models are proven and easy to understand and model • Dimensional models are flexible, can answer many questions • OLAP use cases require slicing and dicing data across multiple dimensions • JOIN capability is critical for achieving data models that can answer various questions © DataStax, All Rights Reserved. 19 Fact Dim Dim Dim Dim
  • 20. Limitations of Spark SQL • Cassandra + Spark cluster provides JOIN functionality • Spark SQL is not able to pass filters applied on one table to another table if both tables are joined on filtering columns. • Predicate pushdowns are not working for Outer JOIN relationship • Pushing predicates to Cassandra/Data source guarantees better performance © DataStax, All Rights Reserved. 20 Sample DAG plan for a JOIN SQL with 5 tables SQL Example: Select c.customer_id, c.customer_name , i.invoice_amount From customer c,invoice i Where c.customer_id = i.customer_id And c.customer_id = 123; Spark splits above SQL into Select c.customer_id, c.customer_name from customer c Where c.customer_id = 123; Select i.customer_id, invoice_amount From invoice i;
  • 21. Custom Thrift Server to Optimize SQL Statements • Adds predicates to joining tables based on matching join columns • Converts IN conditions to = conditions whenever IN List has only one value • Adds IN predicate on partition column based on the range predicates supplied on non-partition key columns © DataStax, All Rights Reserved. 21 Example Select c.customer_id, c.customer_name , i.invoice_amount From customer c, invoice i Where c.customer_id = i.customer_id And c.customer_id IN (123) Select c.customer_id, c.customer_name , i.invoice_amount From customer c, invoice i Where c.customer_id = i.customer_id And c.customer_id = 123 And i.customer_id = 123 Custom Thrift Server Spark thrift server with Custom Hive Context Inspect Logical Plan Modify Logical Plan (if needed) Submit plan for Execution
  • 22. FiloDB • Cassandra 2.1 has several restrictions on predicate pushdowns • FiloDB is a true columnar store • Provides ~20 – 30 times compression over Cassandra • Very efficient for single and multiple partition scans • Partial Predicate Pushdown support • Provides ~20 - 30 times better read performance over straight Cassandra © DataStax, All Rights Reserved. 22 Ck1 Ck2 Rows of data Get converted to compressed columnar chunks Cassandra Storage of FiloDB data
  • 23. Dimensional Data Modeling for Cassandra + Spark • Simple STAR schema models as much as possible (eliminate snow flakes, outer joins etc) • De-normalized dimensions, facts (avoid duplicating dimensions into facts) • Minimize number of tables involved in joins • Common partitioning strategy across dimensions and facts (easy predicate handling) • Limiting max partition sizes to ~1 GB • Reduce number of partitions for efficient Spark execution, limit partition sizes for efficient Cassandra read operations © DataStax, All Rights Reserved. 23
  • 24. SPIN ODS ADS FILO DB CASSANDRA/SPARK JDBC SPARK C* C* ETL - TALEND RELOAD INCREMENTAL INCREMENTAL THRIFT ODS is truncate/load daily. ADS is complete replica of the source system. Incremental ETL strategy. ODS tables are used to load FiloDB table (incremental) using Spark Jobs. SSRS Power BI Example: ETL Incremental Load Strategy
  • 25. Results & Opportunities • Successfully completed 300 concurrent user load test from Business Objects • <1 second response from thrift servers for 90% of queries • Average of 50 columns & 50 - 500k rows returned • Single partition and multi-partition scans, Joins involving 5-10 FiloDB tables per each query Opportunities • Limitations on the maximum result size that can be collected using Spark SQL • Limitations on the total concurrent result size requested from Spark thrift server • These are tunable limitations © DataStax, All Rights Reserved. 25
  • 26. Lessons Learned • Limitations of Cassandra for Fast Analytics, may require custom development • Have a strategy to handle growth of Cassandra partitions • Throttle read & write work loads for the size of the cluster • Tombstone management • Pick right ETL tool for the job. • Turn off NUMAD service • Lack of monitoring tools on Spark • Spark’s lazy evaluation, makes debugging very difficult © DataStax, All Rights Reserved. 26