SlideShare une entreprise Scribd logo
1  sur  22
How jKool Analyzes Streaming Data in Real
Time with DataStax
Charles Rich
VP of Product Management
jKool – jKoolcloud.com
Thank you for joining. We will begin shortly.
All attendees
placed on mute
Input questions at any time
using the online interface
Webinar Housekeeping
© 2015 jKool, All Rights Reserved. 3
Agenda
• jKool Overview
• jKool Technology
• Challenges
• Why We Selected Cassandra and DataStax
• Demo
jKool Overview
© 2015 jKool, All Rights Reserved. 4
• jKool
– Founded 2014 as an spin-off from Nastel Technologies
– Expertize in building scalable real-time analytics
• Initial Vision
– Address the big data problems we saw at customers
• Inability to analyze data fast enough to take action and address problems
• Too much data – Too little time
– Provide real-time, in-memory analytics (our heritage)
– Leverage open-source
– SaaS (or on-premises)
– Simplicity
© © 2015 jKool, All Rights Reserved. 5
What is jKool?
A solution to Find and Fix Problems Faster (operational intelligence)
DevOps can use jKool to get real-time diagnostics for entire
applications: logs, metrics and transactions.
– Detect anomalies, 2-clicks to root-cause
– Discover log, transaction topologies
– Analyze app behavior
– Diagnose and determine causality
• An alternative to Splunk or Elasticsearch
– Fraction of the cost of Splunk
– Much easier to use than Elasticsearch
© 2015 jKool, All Rights Reserved. 6
Business Value: Instant Insight
Provide high quality app experiences for customers -
Improve customer satisfaction
Enable DevOps to:
– Fix problems faster
• Faster problem resolution, eliminate false alarms
– Deliver releases sooner
• Less time patching and more time innovating
– Be proactive
• Spot trends and prevent problems
© 2015 jKool, All Rights Reserved 7
Features
• Web-based, mobile-friendly dashboard
– Designed for simplicity and power
• Real-time & historical visualization
– Flexible, user configurable
• Analytics immediately detect outliers
– Aggregation, summarization, comparison, including: count, min,
max, avg., bucketing, filtering and Bollinger
• Ease of use
– Talk to your data using English-like query language
• Scale to handle the largest volumes of data
– NoSQL architecture provides elastic scalability
© 2015 jKool, All Rights Reserved. 8
jKool Does Machine Data
• Sequence, Order, Group, Store
• Relationships
• Compute Timing
• Summarization, comparisons
• Triggers based on continuous queries (CEP)
– Subscribe to events min elapsedtime, avg elapsedtime, max
elapsedtime where eventname="Buy" show as linechart
© 2015 jKool, All Rights Reserved 9
Real-time, In-Memory
Analytics
jKool Analyzes
Time-Series Data
Technology
• Elastic Architecture
– Linear scalability – Highly
extensible
– Fast, in-memory analysis
• Open Source
– NoSQL DB, tools and
instrumentation
– No schema to maintain
• FatPipes
– Micro-services for ultimate
flexibility, change and configuration
© 2015 jKool, All Rights Reserved. 10
RESTful
© 2015 jKool, All Rights Reserved. 11
Key to Real-time Analytics
• Process streams as they come while at the same time
avoiding IO
– Streams are split into real-time queue and persistence queue
with eventual consistency
• Both have to be processed in parallel
– Writing to persistence layer and then analyzing will not achieve
near real-time processing
© 2015 jKool, All Rights Reserved 12
Why clustered computing platforms?
• STORM paired with Kafka/JMS and CEP
– Clustered way to process incoming real-time streams
• STORM handles clustering/distribution
• Kafka/JMS for a messaging between grids
– Split streaming workload across the cluster
– Achieve linear scalability for incoming real-time streams
• Apache Spark (alternative to MapReduce)
– For distributing queries and trend analysis
– Micro batching for historical analytics
– Loading large dataset into memory (across different nodes)
– Running queries against large data-sets
Web Interface: DevOps Application Owner
13© 2015 jKool, All Rights Reserved
© 2015 jKool, All Rights Reserved. 14
Challenges: Meeting our Objectives
• Store everything, analyze everything…
• Combined real-time & historical analytics
• Fast response, flexible query capabilities
– Target – for business user
– Insulate us from underlying software
– Hide complexity
• Scale for ingesting data-in-motion
• Scale for storing data-at-rest
• Elasticity & Operational efficiency
• Ease of monitoring & management
© 2015 jKool, All Rights Reserved 15
Challenges: What we experienced
• So many technology options (…so little time…)
– Deciding on the right combination is key early on
• Cassandra/Solr deployment — (it was a learning experience for us)
– Lots of configuration, memory management, replication options
• Monitoring, managing clusters
– Cassandra/Solr, STORM, Zookeeper, Messaging
– +Leverage parent company’s AutoPilot Technology
• Achieving near real-time analytics proved
extremely challenging – but we did it!
– Keeping track of latencies across cluster
– Estimating computational capacity required to crunch incoming
streams
© 2015 jKool, All Rights Reserved 16
Challenges: DB was the bottleneck
• Needed high performance DB platform
• SQL (Oracle, MySQL, etc.)
– No scale. We have had a lot of experience our customer’s issues with
this at our parent company Nastel…
– RAM was “the” bottleneck. Commits take too long and while that is
happening everything else stops
• NoSQL
– Cassandra/Solr (DSE)
– Hadoop/MapReduce
– MongoDB
• Clustered Computing Platforms
– STORM
– MapReduce
– Spark (we learned about this while building jKool)
Why we chose Cassandra/Solr?
• Pros:
– Simple to setup & scale for clustered deployments
– Scalable, resilient, fault-tolerant (easy replication)
– Ability to have data automatically expire (TTL – necessary for our pricing model)
– Configurable replication strategy
– Great for heavy write workloads
• Write performance was better than Hadoop.
• Insert rate was of paramount importance for us – get data in as fast as possible was our goal
• Java driver balances the load amongst the nodes in a cluster for us (master-slave would never have
worked for us)
– Solr provides a way to index all incoming data - essential
– DSE provides a nice integration between Cassandra and Solr
• Cons:
– Susceptible to GC pauses (memory management)
• The more memory the more GC pauses
• Less memory and more nodes seems a better approach than one big “honking” server (we see 6-8GB
optimal, so far)
– Data compaction tasks may hang
© © 2015 jKool, All Rights Reserved 17
© 2015 jKool, All Rights Reserved 18
Why not Hadoop MapReduce?
• MapReduce too slow for real-time workloads
– Ok for batch, not so great for real-time
– Need to be paired with other technologies for query (Hive/Pig)
– Complex to setup, run and operate
• Our goals were simplicity first…
• Opted for STORM/Spark wrapped with our own micro
services platform FatPipes instead of the Map Reduce
functionality
© 2015 jKool, All Rights Reserved 19
Why we chose Cassandra/Solr vs. Mongo?
• Why not Mongo?
– Global write-lock performance concerns…
• Cassandra/Solr
– Java based (our project was in Java)
– Easy to scale, replicate data,
– Flexible write & write consistency levels (ALL, QUORUM, ANY, etc.)
– Did we say Java? Yes.(we like Java…)
• Flexible choice of platform coverage
– Great for time-series data streams (market focus for jKool)
• Inherent query limitations in Cassandra solved via Solr
integration (provided with DSE – as mentioned earlier)
© 2015 jKool, All Rights Reserved 20
What we learned
• Consider your application
– Read heavy or write heavy? Both?
• Evaluate performance of course, but consider the user
– We needed simplicity: setup and scale (us and end user)
– We needed reliability – not planning on targeting data engineers
– We needed auto pruning (TTL)
– We needed easy search
• DSE had this…the others did not provide all of this
– We choose DSE.
© 2015 jKool, All Rights Reserved 21
jKool in Real Time – A Live Demo
Thank you!
Input questions at any time
using the online interface
More information on jKool at: jKoolCloud.com

Contenu connexe

Tendances

Tendances (20)

Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
 
Oracle to Cassandra Core Concepts Guid Part 1: A new hope
Oracle to Cassandra Core Concepts Guid Part 1: A new hopeOracle to Cassandra Core Concepts Guid Part 1: A new hope
Oracle to Cassandra Core Concepts Guid Part 1: A new hope
 
Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2
 
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to Production
 
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
 
Announcing Spark Driver for Cassandra
Announcing Spark Driver for CassandraAnnouncing Spark Driver for Cassandra
Announcing Spark Driver for Cassandra
 
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
 
Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
 
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
 
Webinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the DarkWebinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the Dark
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
 
Keeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatKeeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter what
 

En vedette

En vedette (8)

Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...
Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...
Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and Cassandra
 
3800 die-bonder overview
3800 die-bonder overview3800 die-bonder overview
3800 die-bonder overview
 
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 
Real World Use Case with Cassandra (Eddie Satterly, DataNexus) | C* Summit 2016
Real World Use Case with Cassandra (Eddie Satterly, DataNexus) | C* Summit 2016Real World Use Case with Cassandra (Eddie Satterly, DataNexus) | C* Summit 2016
Real World Use Case with Cassandra (Eddie Satterly, DataNexus) | C* Summit 2016
 

Similaire à How jKool Analyzes Streaming Data in Real Time with DataStax

Similaire à How jKool Analyzes Streaming Data in Real Time with DataStax (20)

How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Fontys Lecture - The Evolution of the Oracle Database 2016
Fontys Lecture -  The Evolution of the Oracle Database 2016Fontys Lecture -  The Evolution of the Oracle Database 2016
Fontys Lecture - The Evolution of the Oracle Database 2016
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Java on the Mainframe
Java on the MainframeJava on the Mainframe
Java on the Mainframe
 
Phases of Big Data Challenges @ Nokia
Phases of Big Data Challenges @ NokiaPhases of Big Data Challenges @ Nokia
Phases of Big Data Challenges @ Nokia
 
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
 
AMIS OOW Review 2012 - Deel 7 - Lucas Jellema
AMIS OOW Review 2012 - Deel 7 - Lucas JellemaAMIS OOW Review 2012 - Deel 7 - Lucas Jellema
AMIS OOW Review 2012 - Deel 7 - Lucas Jellema
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
 
Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and Integration
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Impact of cloud services on the work of oracle technology experts
Impact of cloud services on the work of oracle technology expertsImpact of cloud services on the work of oracle technology experts
Impact of cloud services on the work of oracle technology experts
 
Impact of cloud services on the work of oracle technology experts
Impact of cloud services on the work of oracle technology expertsImpact of cloud services on the work of oracle technology experts
Impact of cloud services on the work of oracle technology experts
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 

Plus de DataStax

Plus de DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

How jKool Analyzes Streaming Data in Real Time with DataStax

  • 1. How jKool Analyzes Streaming Data in Real Time with DataStax Charles Rich VP of Product Management jKool – jKoolcloud.com Thank you for joining. We will begin shortly.
  • 2. All attendees placed on mute Input questions at any time using the online interface Webinar Housekeeping
  • 3. © 2015 jKool, All Rights Reserved. 3 Agenda • jKool Overview • jKool Technology • Challenges • Why We Selected Cassandra and DataStax • Demo
  • 4. jKool Overview © 2015 jKool, All Rights Reserved. 4 • jKool – Founded 2014 as an spin-off from Nastel Technologies – Expertize in building scalable real-time analytics • Initial Vision – Address the big data problems we saw at customers • Inability to analyze data fast enough to take action and address problems • Too much data – Too little time – Provide real-time, in-memory analytics (our heritage) – Leverage open-source – SaaS (or on-premises) – Simplicity
  • 5. © © 2015 jKool, All Rights Reserved. 5 What is jKool? A solution to Find and Fix Problems Faster (operational intelligence) DevOps can use jKool to get real-time diagnostics for entire applications: logs, metrics and transactions. – Detect anomalies, 2-clicks to root-cause – Discover log, transaction topologies – Analyze app behavior – Diagnose and determine causality • An alternative to Splunk or Elasticsearch – Fraction of the cost of Splunk – Much easier to use than Elasticsearch
  • 6. © 2015 jKool, All Rights Reserved. 6 Business Value: Instant Insight Provide high quality app experiences for customers - Improve customer satisfaction Enable DevOps to: – Fix problems faster • Faster problem resolution, eliminate false alarms – Deliver releases sooner • Less time patching and more time innovating – Be proactive • Spot trends and prevent problems
  • 7. © 2015 jKool, All Rights Reserved 7 Features • Web-based, mobile-friendly dashboard – Designed for simplicity and power • Real-time & historical visualization – Flexible, user configurable • Analytics immediately detect outliers – Aggregation, summarization, comparison, including: count, min, max, avg., bucketing, filtering and Bollinger • Ease of use – Talk to your data using English-like query language • Scale to handle the largest volumes of data – NoSQL architecture provides elastic scalability
  • 8. © 2015 jKool, All Rights Reserved. 8 jKool Does Machine Data • Sequence, Order, Group, Store • Relationships • Compute Timing • Summarization, comparisons • Triggers based on continuous queries (CEP) – Subscribe to events min elapsedtime, avg elapsedtime, max elapsedtime where eventname="Buy" show as linechart
  • 9. © 2015 jKool, All Rights Reserved 9 Real-time, In-Memory Analytics jKool Analyzes Time-Series Data
  • 10. Technology • Elastic Architecture – Linear scalability – Highly extensible – Fast, in-memory analysis • Open Source – NoSQL DB, tools and instrumentation – No schema to maintain • FatPipes – Micro-services for ultimate flexibility, change and configuration © 2015 jKool, All Rights Reserved. 10 RESTful
  • 11. © 2015 jKool, All Rights Reserved. 11 Key to Real-time Analytics • Process streams as they come while at the same time avoiding IO – Streams are split into real-time queue and persistence queue with eventual consistency • Both have to be processed in parallel – Writing to persistence layer and then analyzing will not achieve near real-time processing
  • 12. © 2015 jKool, All Rights Reserved 12 Why clustered computing platforms? • STORM paired with Kafka/JMS and CEP – Clustered way to process incoming real-time streams • STORM handles clustering/distribution • Kafka/JMS for a messaging between grids – Split streaming workload across the cluster – Achieve linear scalability for incoming real-time streams • Apache Spark (alternative to MapReduce) – For distributing queries and trend analysis – Micro batching for historical analytics – Loading large dataset into memory (across different nodes) – Running queries against large data-sets
  • 13. Web Interface: DevOps Application Owner 13© 2015 jKool, All Rights Reserved
  • 14. © 2015 jKool, All Rights Reserved. 14 Challenges: Meeting our Objectives • Store everything, analyze everything… • Combined real-time & historical analytics • Fast response, flexible query capabilities – Target – for business user – Insulate us from underlying software – Hide complexity • Scale for ingesting data-in-motion • Scale for storing data-at-rest • Elasticity & Operational efficiency • Ease of monitoring & management
  • 15. © 2015 jKool, All Rights Reserved 15 Challenges: What we experienced • So many technology options (…so little time…) – Deciding on the right combination is key early on • Cassandra/Solr deployment — (it was a learning experience for us) – Lots of configuration, memory management, replication options • Monitoring, managing clusters – Cassandra/Solr, STORM, Zookeeper, Messaging – +Leverage parent company’s AutoPilot Technology • Achieving near real-time analytics proved extremely challenging – but we did it! – Keeping track of latencies across cluster – Estimating computational capacity required to crunch incoming streams
  • 16. © 2015 jKool, All Rights Reserved 16 Challenges: DB was the bottleneck • Needed high performance DB platform • SQL (Oracle, MySQL, etc.) – No scale. We have had a lot of experience our customer’s issues with this at our parent company Nastel… – RAM was “the” bottleneck. Commits take too long and while that is happening everything else stops • NoSQL – Cassandra/Solr (DSE) – Hadoop/MapReduce – MongoDB • Clustered Computing Platforms – STORM – MapReduce – Spark (we learned about this while building jKool)
  • 17. Why we chose Cassandra/Solr? • Pros: – Simple to setup & scale for clustered deployments – Scalable, resilient, fault-tolerant (easy replication) – Ability to have data automatically expire (TTL – necessary for our pricing model) – Configurable replication strategy – Great for heavy write workloads • Write performance was better than Hadoop. • Insert rate was of paramount importance for us – get data in as fast as possible was our goal • Java driver balances the load amongst the nodes in a cluster for us (master-slave would never have worked for us) – Solr provides a way to index all incoming data - essential – DSE provides a nice integration between Cassandra and Solr • Cons: – Susceptible to GC pauses (memory management) • The more memory the more GC pauses • Less memory and more nodes seems a better approach than one big “honking” server (we see 6-8GB optimal, so far) – Data compaction tasks may hang © © 2015 jKool, All Rights Reserved 17
  • 18. © 2015 jKool, All Rights Reserved 18 Why not Hadoop MapReduce? • MapReduce too slow for real-time workloads – Ok for batch, not so great for real-time – Need to be paired with other technologies for query (Hive/Pig) – Complex to setup, run and operate • Our goals were simplicity first… • Opted for STORM/Spark wrapped with our own micro services platform FatPipes instead of the Map Reduce functionality
  • 19. © 2015 jKool, All Rights Reserved 19 Why we chose Cassandra/Solr vs. Mongo? • Why not Mongo? – Global write-lock performance concerns… • Cassandra/Solr – Java based (our project was in Java) – Easy to scale, replicate data, – Flexible write & write consistency levels (ALL, QUORUM, ANY, etc.) – Did we say Java? Yes.(we like Java…) • Flexible choice of platform coverage – Great for time-series data streams (market focus for jKool) • Inherent query limitations in Cassandra solved via Solr integration (provided with DSE – as mentioned earlier)
  • 20. © 2015 jKool, All Rights Reserved 20 What we learned • Consider your application – Read heavy or write heavy? Both? • Evaluate performance of course, but consider the user – We needed simplicity: setup and scale (us and end user) – We needed reliability – not planning on targeting data engineers – We needed auto pruning (TTL) – We needed easy search • DSE had this…the others did not provide all of this – We choose DSE.
  • 21. © 2015 jKool, All Rights Reserved 21 jKool in Real Time – A Live Demo
  • 22. Thank you! Input questions at any time using the online interface More information on jKool at: jKoolCloud.com

Notes de l'éditeur

  1. Choices we had to make and the architectural decisions to build a system for both real-time and historical…
  2. For Java applications, initially with RESTful for any apps Open source collectors Log4J, SLF4J, Logback, JMX, HTTP Spark RESTful API… More coming…
  3. Real-time, in-memory analytics Operational Intelligence for machine data Analyze & Visualize: Logs & Metrics & Transactions Gain insight, root cause, understand application behavior Reduce MTTR (mean-time-to-problem-resolution) Leverage NoSQL and Open source Deliver Operational Intelligence for machine data Analyze your logs & metrics in real-time (& historical) Spot patterns, trends, behavior SaaS or On-Premise Built ground up on Big data analytics platforms NoSQL, STORM, Spark, Kafka Light weight, simple, open source instrumentation Improved cost/benefit
  4. Keep developers developing and enable App support to analyze app behavior, determine causality and resolve is Reduce time associated with manually analyzing logs Improve productivity of your DevOps, Application teams Keep developers coding…enable app support Benefits: - Fix faster Release sooner Be proactive For the Business: Focus your time on what matters to your business issues Quickly identify risks and opportunities Learn what’s important – what you didn’t know… Exploit hidden & perishable insights Turn machine data into insight Detect preventable losses… if you knew, you could act now… Know your application and how it is used Just deployed a new feature? Are people using it? Was it worth the cost?
  5. Relationships Splitting & morphing Causality Tree, topology Compute Timing Elapsed Time (ev1..evN) Summarization, comparisons High/low bands, outliers, counts (max, min, avg) Interval Bucketing (second, min, hour) Compare Ev1..EvN
  6. Real-time means analyzing before data is persisted… We created FatPipes to manage this around STORM/Spark with message infrastructure Kafka/JMS Process data but don’t wait till after a write – no disk IO, split, analyze 2 parallel architectures to handle historical and one for real-time (eventually… both real-time and historical must reconcile) User interacts with Real-time via JKQL (jKool Query Language) English like query language for analyzing data in motion and at rest. “Subscribe” verb for real-time updates
  7. Clustered computing was selected to scale with the demands of the workload STORM – distribution of CEP (also helpful for distributing data to specific tasks, conditionally) JMS/Kafka for distributing data amongst nodes in our real-time grids CEP for processing streams and publishing results to clients via JMS/Kafka Spark jobs will crunch the data and then write back to Cassandra Created our own micro-services architecture (FatPipes) which runs on top of: STORM/JMS/Kafka STORM – distributes the CEP (also helpful for distributing data to specific tasks, conditionally) JMS/Kafka for distributing data amongst nodes in our real-time grids CEP for processing streams and publishing results to clients via JMS/Kafka FatPipes can be embedded or distributed Real-time Grid Feeds tracking data and real-time queries to CEP and back
  8. Customers from our experience didn’t know what they needed to store until they actually need it…but then it is to late…so hence, store everything… Historical requirements for architecture very different thank real-time to deliver with fast response time and to provide user defined KPIs Scale must there for interaction as it comes in – not how many TB’s you can analyze but fast can you go and keep up with data streams Can’t build everything, so to accelerate time to market, how much open-source could we leverage? For Elasticity we can add nodes horizontally.
  9. Can’t test all possibilities and then select…not agile…not enough time. Long term analytics needs different than real-time and weed out what would slow down real-time Providing this as a service, estimating capacity also a challenge.
  10. We are on DSE 4.6.5 and going shortly to 4.7 (today is: 10.13.15 …) We tried using CQL CQL (Cassandra Query Language) Ad-hoc query would be very hard ad CQL query capabilities are very limited. We would need to define all the tables and indexes for every possible query permutation and the user would need to know the event_id. – too much to be usable Too slow. We only use CQL for admin tasks Lucene addresses the above problem, but adds its own issues. We started with Lucene and did inline inserts and the time to index was too long For each Cassandra insert ,we had to write a Lucene doc…since there is no rewrite, we had to read, delete and then write – a series of batch ops and too slow for our real-time goals Solr helped with this – we write to Cassandra and Solr handles the indexed (automagically) for us Solr is a Web app on top of Lucene We do use Solr indexes jKQL does invoke Solr queries. But we needed to enhance this as we are a multi-tenancy solution and pass it our repository_id to ensure we get the data appropriate to that tenant. We use 3 nodes in a Cassandra Cluster and data ingested is replicated to Solr clusters with 3 nodes (they have both Cassandra and Solr) Data-at-rest – we can ingest as fast as Cassandra can handle it using eventual availability. The data is distributed across Cassandra and Solr. We use DSJava driver. Data is written to coordinator node and he handles the distribution to other nodes. Quorum means 1/2 + 1. You would say that "we use consistency level "quorum" for queries", which means half +1 of the replicas must respond. Like if you were taking a vote and in order for the vote to be valid, you need a quorum of members to be present. Has the same meaning here. If your replication factor is 3, you need 2 of the 3 nodes to respond. 1/2 + 1 using integer division, so half of 3, using integer division is 1 (1.5 truncated) + 1 = 2 We use consistency level #1 for writes For reads we use quorum (admin tasks) All other reads use Solr - the jKQL queries you see on dashboard are all coming from Solr.
  11. STORM for ingesting Spark for processing data (compute framework)
  12. Simple to setup & scale for clustered deployments Scalable, resilient, fault-tolerant (easy replication) Ability to have data automatically expire (TTL – necessary for our pricing model) Configurable replication strategy Great for heavy write workloads Write performance was better than Hadoop. Insert rate was of paramount importance for us – get data in as fast as possible was our goal Java driver balances the load amongst the nodes in a cluster for us (master-slave would never have worked for us) Solr provides a way to index all incoming data - essential DSE provides a nice integration between Cassandra and Solr