SlideShare a Scribd company logo
1 of 24
Blending Cassandra Data Into the mix

Matt Casters| Chief Architect, Data Integration at Pentaho
Kettle Project Founder

#CASSANDRAEU

CASSANDRASUMMIT
EU
What we will discuss today…
*
*
*
*

About Pentaho
Blended Big Data Integration
Demo
Takeaway & QA

#CASSANDRAEU

CASSANDRASUMMIT
EU
About Pentaho
Our mission and key takeaways

#CASSANDRAEU

CASSANDRASUMMIT
EU
Pentaho Mission
Enabling the future of analytics
Modern unified business analytics and data
integration platform
•
•
•
•

Full spectrum of advancing analytics for all key roles
Embeddable, cloud-ready analytics
Big data blending for analytics in real-time environments
Broadest and deepest big data integration

•
•

Open, pluggable, purpose built for the future
Early sustained leadership in big data
ecosystem with technology innovation

Innovation through open source

Critical mass achieved
•
•

Over 1,200 commercial customers
Over 10,000 production deployments

#CASSANDRAEU

CASSANDRASUMMIT
EU
Pentaho and Cassandra
* ETL and Analytics that complement Cassandra
* Create data transformations from source systems into
Cassandra, and Cassandra to target systems, via drag and
drop
* Quickly visualize and explore data inside Cassandra with
Pentaho Data Services
* Deeper Casandra/Pentaho integration in development
* Keep up with the latest Cassandra developments
* Provide underlying API compatibility layer

#CASSANDRAEU

CASSANDRASUMMIT
EU
The New Reality
Simplified Analysis for all Users

Billing

Social
Media

Customer

Analytics

Existing & New Data
Infrastructure &
Processes

Web

Location Network

ANY Data

ANY Environment

ANY Analytics

•
•
•
•

•
•
•
•
•

•
•
•
•
•

Relational
Operational
Big Data
Data sources not yet
anticipated…

#CASSANDRAEU

Data warehouses
Data marts
Stack vendors
Cloud
Embedded

Reports
Dashboards
Visualizations
Discovery
Predictive

CASSANDRASUMMIT
EU
Pentaho 5.0 Architected for the Future
Simplified analytics experience for all users

Simplified
Analytics
Experience

Blended
Big Data

Enterprise
Big Data
Integration

#CASSANDRAEU

CASSANDRASUMMIT
EU
Basic Cassandra Use Case
• Enterprise Customer Data Store

• Visual ETL development
with Pentaho Data
Integration
• Reporting, Dashboards,
Visualization and Data
discovery with full spectrum
analytics

System
Scope

Source Systems

…

Pentaho Data
Integration

Enterprise Data Store

Pentaho Analytics
• Reporting
• Dashboards
• Visualization
• Discovery

Pentaho Data
Integration

Target Systems
#CASSANDRAEU

CASSANDRASUMMIT
EU
Big Data Orchestration

#CASSANDRAEU

CASSANDRASUMMIT
EU
Orchestration Toolkit

#CASSANDRAEU

CASSANDRASUMMIT
EU
Pentaho Visual Development
Integrate, Manipulate, Ingest

Schedule
Model

Would you rather do this?

#CASSANDRAEU

… or this?
CASSANDRASUMMIT
EU
Broad Connectivity

PDI

#CASSANDRAEU

Cassandra
cluster

Analytics

CASSANDRASUMMIT
EU
Blending data
When copying data all over the place stops making sense

#CASSANDRAEU

CASSANDRASUMMIT
EU
Analytics on Cassandra– Two Approaches
PDI Data
Services

Direct Access

Analytics

Cassandra
cluster

Access via Database
Analytics
PDI ETL
RDBMS

#CASSANDRAEU

CASSANDRASUMMIT
EU
Direct Access to Cassandra Data

Cassandra
cluster

PDI ETP

Pentaho Operational Reports

Extract -> Transform -> Present

Pentaho Operational Dashboards

#CASSANDRAEU

CASSANDRASUMMIT
EU
Pentaho Operational Dashboards
Architected Access for Reliable Executive Insight

#CASSANDRAEU

CASSANDRASUMMIT
EU
Customer Value from Big Data
Monetizing big data-driven use cases driving need to blend data
Drive incremental revenue
•

Predict customer behavior across all channels

•

Understand and monetize customer behavior

•

Begin to monetize data as a service

Improve operational effectiveness
•

Machines/sensors: predict failures, network attacks

•

Financial risk management: reduce fraud, increase security

Reduce data warehouse cost
•
•
#CASSANDRAEU

Integrate new data sources without increased database cost
Provide online access to ‘dark data’
CASSANDRASUMMIT
EU
Why Blending at the Source Matters
Customer Experience Analytics for loyalty and revenue

Customer

Provisioning

Existin
g
ETL
Tool
or PDI

Call Detail Records from:
• Billing
• Payment
• Usage

PDI

Network

PDI

Analyze quality of service:

EDW

Billing

Analytics

•
•
•
•
Blend revenue-related and
quality-of-service data
together to find customers at
risk

NoSQL

Network outages
Dropped calls
Poor quality
Calls to support center

For profiles of customers:
•
•
•
•

Up for renewal
Profitable
Multiple agreements/services
In competitive area

Determine best action to take:
Location

#CASSANDRAEU

Call Detail Records from Network:
• Outages
• Drops
• Service Quality

•
•
•

Billing Credit
Customer Coupon
No Action

CASSANDRASUMMIT
EU
Accurate, Blended Big Data Analytics
Optimally stored data, blended when needed
• Just in time blending of data from multiple sources for a complete picture
• Connect, combine and transform data from multiple sources
• Query data directly from any transformation
• Access architected blends with the full spectrum of Pentaho Analytics
• Manage governance and security of data for on-going accuracy

Custom
er
Provisioning

Existin
g
ETL
Tool
or PDI

EDW

Billing

Just in time blending
PDI

Network

PDI

Analytics

NoS
QL

Location

#CASSANDRAEU

CASSANDRASUMMIT
EU
Bring More Big Data to Life
Adaptive Big Data Layer: broadest, deepest big data support

Broadest options for storing and blending data
• New analytic use case templates for Hadoop and
Splunk
• Deeper NoSQL integration to and direct reporting

• Hadoop high availability support with MapR
• Expanded big data integration
•
•

#CASSANDRAEU

New integrations: Redshift, Impala and Splunk
New certifications: DataStax , Cassandra , Intel,
Hortonworks, latest Cloudera, MapR, MongoDB, …

CASSANDRASUMMIT
EU
Demo!
Demonstrate how to easily write to and read from Cassandra
Demonstrate how to blend data

#CASSANDRAEU

CASSANDRASUMMIT
EU
Takeaways…

#CASSANDRAEU

CASSANDRASUMMIT
EU
Pentaho 5.0 key takeaways
Meeting the demands of the big data-driven enterprise
Analytics

Simplified analytics experience with a
new modern interface

Blended
Big Data

Blended Big Data at the source for more
accurate insights

Enterprise
Big Data
Integration

#CASSANDRAEU

Enterprise-ready data integration and simplified
embedding for any environment
CASSANDRASUMMIT
EU
blog.pentaho.com

Facebook.com/Pentaho

@Pentaho

Pentaho Business Analytics

THANK YOU

Any questions?

#CASSANDRAEU

www.pentaho.com

CASSANDRASUMMIT
EU

More Related Content

More from DataStax Academy

Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph DatabasesDataStax Academy
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkDataStax Academy
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and CassandraDataStax Academy
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talkDataStax Academy
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayDataStax Academy
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 

More from DataStax Academy (20)

Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and Cassandra
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

C* Summit EU 2013: Blending Cassandra Data Into The Mix

  • 1. Blending Cassandra Data Into the mix Matt Casters| Chief Architect, Data Integration at Pentaho Kettle Project Founder #CASSANDRAEU CASSANDRASUMMIT EU
  • 2. What we will discuss today… * * * * About Pentaho Blended Big Data Integration Demo Takeaway & QA #CASSANDRAEU CASSANDRASUMMIT EU
  • 3. About Pentaho Our mission and key takeaways #CASSANDRAEU CASSANDRASUMMIT EU
  • 4. Pentaho Mission Enabling the future of analytics Modern unified business analytics and data integration platform • • • • Full spectrum of advancing analytics for all key roles Embeddable, cloud-ready analytics Big data blending for analytics in real-time environments Broadest and deepest big data integration • • Open, pluggable, purpose built for the future Early sustained leadership in big data ecosystem with technology innovation Innovation through open source Critical mass achieved • • Over 1,200 commercial customers Over 10,000 production deployments #CASSANDRAEU CASSANDRASUMMIT EU
  • 5. Pentaho and Cassandra * ETL and Analytics that complement Cassandra * Create data transformations from source systems into Cassandra, and Cassandra to target systems, via drag and drop * Quickly visualize and explore data inside Cassandra with Pentaho Data Services * Deeper Casandra/Pentaho integration in development * Keep up with the latest Cassandra developments * Provide underlying API compatibility layer #CASSANDRAEU CASSANDRASUMMIT EU
  • 6. The New Reality Simplified Analysis for all Users Billing Social Media Customer Analytics Existing & New Data Infrastructure & Processes Web Location Network ANY Data ANY Environment ANY Analytics • • • • • • • • • • • • • • Relational Operational Big Data Data sources not yet anticipated… #CASSANDRAEU Data warehouses Data marts Stack vendors Cloud Embedded Reports Dashboards Visualizations Discovery Predictive CASSANDRASUMMIT EU
  • 7. Pentaho 5.0 Architected for the Future Simplified analytics experience for all users Simplified Analytics Experience Blended Big Data Enterprise Big Data Integration #CASSANDRAEU CASSANDRASUMMIT EU
  • 8. Basic Cassandra Use Case • Enterprise Customer Data Store • Visual ETL development with Pentaho Data Integration • Reporting, Dashboards, Visualization and Data discovery with full spectrum analytics System Scope Source Systems … Pentaho Data Integration Enterprise Data Store Pentaho Analytics • Reporting • Dashboards • Visualization • Discovery Pentaho Data Integration Target Systems #CASSANDRAEU CASSANDRASUMMIT EU
  • 11. Pentaho Visual Development Integrate, Manipulate, Ingest Schedule Model Would you rather do this? #CASSANDRAEU … or this? CASSANDRASUMMIT EU
  • 13. Blending data When copying data all over the place stops making sense #CASSANDRAEU CASSANDRASUMMIT EU
  • 14. Analytics on Cassandra– Two Approaches PDI Data Services Direct Access Analytics Cassandra cluster Access via Database Analytics PDI ETL RDBMS #CASSANDRAEU CASSANDRASUMMIT EU
  • 15. Direct Access to Cassandra Data Cassandra cluster PDI ETP Pentaho Operational Reports Extract -> Transform -> Present Pentaho Operational Dashboards #CASSANDRAEU CASSANDRASUMMIT EU
  • 16. Pentaho Operational Dashboards Architected Access for Reliable Executive Insight #CASSANDRAEU CASSANDRASUMMIT EU
  • 17. Customer Value from Big Data Monetizing big data-driven use cases driving need to blend data Drive incremental revenue • Predict customer behavior across all channels • Understand and monetize customer behavior • Begin to monetize data as a service Improve operational effectiveness • Machines/sensors: predict failures, network attacks • Financial risk management: reduce fraud, increase security Reduce data warehouse cost • • #CASSANDRAEU Integrate new data sources without increased database cost Provide online access to ‘dark data’ CASSANDRASUMMIT EU
  • 18. Why Blending at the Source Matters Customer Experience Analytics for loyalty and revenue Customer Provisioning Existin g ETL Tool or PDI Call Detail Records from: • Billing • Payment • Usage PDI Network PDI Analyze quality of service: EDW Billing Analytics • • • • Blend revenue-related and quality-of-service data together to find customers at risk NoSQL Network outages Dropped calls Poor quality Calls to support center For profiles of customers: • • • • Up for renewal Profitable Multiple agreements/services In competitive area Determine best action to take: Location #CASSANDRAEU Call Detail Records from Network: • Outages • Drops • Service Quality • • • Billing Credit Customer Coupon No Action CASSANDRASUMMIT EU
  • 19. Accurate, Blended Big Data Analytics Optimally stored data, blended when needed • Just in time blending of data from multiple sources for a complete picture • Connect, combine and transform data from multiple sources • Query data directly from any transformation • Access architected blends with the full spectrum of Pentaho Analytics • Manage governance and security of data for on-going accuracy Custom er Provisioning Existin g ETL Tool or PDI EDW Billing Just in time blending PDI Network PDI Analytics NoS QL Location #CASSANDRAEU CASSANDRASUMMIT EU
  • 20. Bring More Big Data to Life Adaptive Big Data Layer: broadest, deepest big data support Broadest options for storing and blending data • New analytic use case templates for Hadoop and Splunk • Deeper NoSQL integration to and direct reporting • Hadoop high availability support with MapR • Expanded big data integration • • #CASSANDRAEU New integrations: Redshift, Impala and Splunk New certifications: DataStax , Cassandra , Intel, Hortonworks, latest Cloudera, MapR, MongoDB, … CASSANDRASUMMIT EU
  • 21. Demo! Demonstrate how to easily write to and read from Cassandra Demonstrate how to blend data #CASSANDRAEU CASSANDRASUMMIT EU
  • 23. Pentaho 5.0 key takeaways Meeting the demands of the big data-driven enterprise Analytics Simplified analytics experience with a new modern interface Blended Big Data Blended Big Data at the source for more accurate insights Enterprise Big Data Integration #CASSANDRAEU Enterprise-ready data integration and simplified embedding for any environment CASSANDRASUMMIT EU
  • 24. blog.pentaho.com Facebook.com/Pentaho @Pentaho Pentaho Business Analytics THANK YOU Any questions? #CASSANDRAEU www.pentaho.com CASSANDRASUMMIT EU

Editor's Notes

  1. Icons are nice and the build-order is great!My suggestion the top 3 icons on the left-hand side:CustomerProvisioningBillingSuggestion for the bottom 3 icons:WebNetworkSocial Media(note: Location seems to be important to AT&T but we can just mention this)I need to come up with an explanation for why the arrow below “Just in Time Integration” is bi-directional instead of just flowing to Analytics
  2. Icons are nice and the build-order is great!My suggestion the top 3 icons on the left-hand side:CustomerProvisioningBillingSuggestion for the bottom 3 icons:WebNetworkSocial Media(note: Location seems to be important to AT&T but we can just mention this)I need to come up with an explanation for why the arrow below “Just in Time Integration” is bi-directional instead of just flowing to Analytics
  3. Let’s look at an example of blending at the source to better understand these points. Here we are looking at an example of Telco customer experience analytics. Customer Experience Analytics have the same goal in every industry – preventing customer churn and creating better loyalty in order to protect and grow revenue – after all, in this age of commodotization, service and fast response to product requests become the new differentiators driving loyalty in most industries. Telco customer allegiance comes mostly from satisfaction with calling plans and the quality and availability of service. Call detail records have long been created and derived from the operational systems for access to BI and reporting systems via warehousing, but they only make up part of the picture. (Build click 2) Quality of service changes in real time dependent on the network – was the customer able to connect, to hear, to remain connected without being dropped, etc? This network-based data is usually captured in a Big Data source that is capable of handling the volume and unstructured nature of the data, and must be blended with the Call Detail Record information to give the complete picture of a customer’s experience.(Build click 3) With Pentaho, you can easily create architected, blended views across both the traditional Call Detail Records in the warehouse, and the network data streaming into the Big Data/NoSQLstore (MongoDB in this example) without sacrificing the governance or performance you expect. These blended views allow your analysts and customer call centers to get this accurate, of-the-minute information in real time to determine the best action to take for each customer to maximize their satisfaction and retain them as loyal customers even when outages or other service quality issues occur.
  4. Other solutions in the market talk about blending - but it’s not apples to apples. Blending “at the glass”, i.e. blending done by end users or analysts away from the source with no knowledge of the underlying semantics, often delivers inaccurate or even completely incorrect results, as there is no way to ensure that the chosen fields being matched truly do match. For instance, think what happens when someone matches two fields both named “revenue” in records that match on “customer”, but one is a monthly sum total and the other is a daily total – this won’t be apparent to that analyst since they are blending based on similar names. The analyst then runs a summation that adds the two together as the day’s total revenue from that customer. He/she will have unwittingly added the monthly figure into each day’s total, distorting the actual revenue generated from that customer dramatically. Your business then targets that customer as highly profitable and offers significant discounts to maintain their interest. Not only have you targeted the wrong customer and potentially ignored the real profitable customers in favor of him, but you’ve also now given him undeserved discounts. The net result lowers your revenue from this customer, and potentially loses you profitable others who were more deserving but left you in favor of competitors offering them discounts. You’ve made the wrong decision because the analytics themselves were inaccurate and incorrect. Your only choice to avoid this with tools that blend like this is to train every user and analyst on the semantics of the data to ensure reliable results – a solution that’s largely infeasible for most organizations as it would take far too much time and expense while impacting productivity. Even if you can take on this level of investment in training, you still face issues with the timeliness of the data, since these tools do not pull from the source systems. How do you know the data pulled is indeed the latest and therefore the most accurate on that level as well? 
  5. This “just in time”, architected blending delivers accurate big data analytics based on the blended data. You can connect to, combine, and even transform data from any of the multiple data stores in your hybrid data ecosystem into these blended views, then query the data directly via that view using the full spectrum of analytics in the Pentaho Analytics platform, including predictive analytics. Most importantly, since these blends are architected on the source data, you maintain all the rules of governance and security over the data while providing the ease of use and real time access needed for today’s agile analytics requirements. Sensitive data is kept from those who are not allowed to use or view it. You maintain full lifecycle and change management and control, so you can assure the blends being used meet changing requirements. You preserve auditability. Your blends are designed with full knowledge of the underlying data volumes and source system capabilities and constraints, preserving throughput and performance during analytic access and preventing the “query from hell”/”runaway query” problems prevalent in many federation tools. Combining the power of design via drag-and-drop across all data sources, including schemas generated on read from big data sources, with knowledge of the full data semantics - the real meaning, cardinality, and match-ability of fields and values in the data - means your business gets accurate results in its analytics, leading to optimized decisions and actions that can really impact your business positively and improve your results.
  6. As part of our any data, any source ability, we now have the broadest and most sophisticated way to access big data sources. With specific big data templates we again can reduce IT barriers (programming and coding) and allow users to access their data with ease. Our product also allows businesses to scale into and deploy big data as they see fit. As part of any source, should a business try Cassandra, but realize they prefer MongoDB, they can easily access both types of data, expanding their integration ability