This session will provide insight on making the most of your data assets with analytics, and what you need for your next analytics project. We’ll showcase how the MariaDB AX solution delivers fast and scalable analytics using real-world use cases.
2. Why Analytics ?
• Get the most value of your data asset
• Faster Better decision making process
• Cost reduction
• New products and services
3. Type of Analytics
• Descriptive: What happened ?
• Diagnostics: Why did it happen?
• Predictive: What is likely to happen?
• Prescriptive: What should I do about it ?
5. Diagnostics: Why did it happen
● Aggregates: aggregate measure over one or
more dimension
○ Find total sales
○ Top five product ranked by sales
● Roll-ups: Aggregate at different levels of
dimension hierarchy
○ given total sales by city, roll-up to get sales by
state
● Drill-down: Inverse of roll-ups
○ given total sales by state, drill-down to get
total by city
● Slicing and Dicing:
○ Equality and range selections on one or more
dimensions
6. Predictive: What is likely to happen
● Sales Prediction
○ Analyze data to identify trends, spot
weakness or determine conditions
among broader data sets for making
decisions about the future
● Targeted marketing
○ what is likelihood of a customer buying
a particular product based on past
buying behavior
7. Prescriptive: What is the best course of action?
Paradox of choices
With too many choices, which one is the best?
8. Data Analytics Use Cases
By industry
Finance
Identify trade patterns
Detect fraud and anomalies
Predict trading outcomes
Manufacturing
Simulations to improve design/yield
Detect production anomalies
Predict machine failures (sensor data)
Telecom
Behavioral analysis of customer calls
Network analysis (perf and reliability)
Healthcare
Find genetic profiles/matches
Analyze health vs spending
Predict viral outbreaks
9. Data Analytics Solution Consideration
• Technical Considerations
• Real-time analytics
– High speed data ingestion
– High speed read queries
• Analytics
– Built in analytics
– Choice of BI tools
• Business Considerations
• Cost of deployment and use
– Hardware and
Price/Performance ratio
– Large talent pool
10. Existing Approaches
Limited real time analytics
Slow releases of product innovation
Expensive hardware and software
Data Warehouses
Hadoop / NoSQL
LIMITED SQL SUPPORT
DIFFICULT TO
INSTALL/MANAGE
LIMITED TALENT POOL
DATA LAKE W/ NO DATA
MANAGEMENT
Hard to use
Purpose Built rather than predictive
analytics
13. MariaDB AX
MariaDB Server
MariaDB MaxScale
MariaDB ColumnStore
Parallel queries
Distributed storage
No indexes
Automatic partitioning
Read optimized
High compression
Low disk IO ColumnStore
Storage
ColumnStore
Storage
ColumnStore
Storage
MariaDB Server
ColumnStore
MariaDB Server
ColumnStore
MariaDB MaxScale
MariaDB Server
ColumnStore
ColumnStore
Storage
MariaDB MaxScale
UM
User
Module
PM
Performance Module
14. MariaDB ColumnStore
High performance columnar storage engine that supports a wide variety
of analytical use cases in highly scalable distributed environments
Parallel query
processing for distributed
environments
Faster, More
Efficient Queries
Single Interface for
OLTP and analytics
Easy to Manage and
Scale
Easier Enterprise
Analytics
Power of SQL and
Freedom of Open
Source to Big Data
Analytics
Better Price
Performance
15. Better Price
Performance
Flexible deployment option
• Cloud and On-premise
• Run on commodity hardware
• Open Source, Subscription based pricing
90.3%
less per TB
per year
Commercial Data
Warehouse
MariaDB
ColumnStore
No need to maintain a third platform
• Run analytics from the same SQL front end
• No need to update application code
• Leverage MariaDB Extensible architecture
High data compression
• More efficient at storing big data
• Less hardware
Customers have saved by going to MariaDB AX against
Oracle(HealthCare), MemSQL(Auto-parts), Vertica(Finance, SEO
Marketing): Come see them at M18!
16. Easier Enterprise
Analytics
ANSI SQL
Single SQL Front-end
• Use a single SQL interface for analytics and OLTP
• Leverage MariaDB Security features - Encryption for
data in motion, role based access and auditing
Full ANSI SQL
• No more SQL “like” query
• Support complex join, aggregation and window
function
Easy to manage and scale
• Eliminate needs for indexes and views
• Automated horizontal/vertical partitioning
• Linear scalable by adding new nodes as data grows
• Out of box connection with BI tools
MariaDB AX customers across industries: Auto Parts, Finance, Ad
analytics, Asset management, Telecommunication, Healthcare,
Digital Media, Carpooling App
17. Faster, More
Efficient Queries
Optimized for Columnar storage
• Columnar storage reduces disk I/O
• Blazing fast read-intensive workload
• Ultra fast data import
Parallel
Query Processing
Parallel distributed query execution
• Distributed queries into series of parallel operations
• Fully parallel high speed data ingestion
– TPCH lineitem table - 750K to 1 million rows per min
Highly available analytic environment
• Built-in Redundancy
• Automatic fail-over
MariaDB AX customers across industries: Auto Parts, Finance, Ad
analytics, Asset management, Telecommunication, Healthcare,
Digital Media, Carpooling App
18. Ingestion Analytics
Data Services
Bulk Data Adapters
Apache Kafka
Streaming Data Adapters
Spark / Python / ML
Bulk Data Adapters
Operations
Transaction (OLTP)
MariaDB Server
InnoDB
MariaDB MaxScale
Web/Mobile Services
MariaDB MaxScale
Analytics (OLAP)
MariaDB Server
ColumnStore
Simple & Streamlined data ingestion
19. Streaming data
adapters – Apache
Kafka
Stream all messages published
to Apache Kafka topics to
MariaDB AX automatically and
continuously - enable data
from many sources to be
streamed and collected for
analysis without complex
code
MariaDB Server
ColumnStore
Apache Kafka
ColumnStore Storage ColumnStore StorageColumnStore Storage
Write API Write API Write API
MariaDB Server
ColumnStore
Streaming Data
Adapter
(Kafka Client)
Topic Topic Topic
20. OLTP to OLAP:
Streaming data
adapters – MaxScale
CDC
Stream all writes from
MariaDB TX to MariaDB AX
automatically and continously -
ensure analytical data is
up to date and not stale, no
need for batch jobs,
manual processes or
human intervention
MariaDB Server
InnoDB
MariaDB Server
ColumnStore
MariaDB MaxScale
ColumnStore Storage ColumnStore StorageColumnStore Storage
Write API Write API Write API
MariaDB Server
ColumnStore
Streaming Data
Adapter
(CDC Client)
CDC Server
22. IHME - Institute of Health Metrics and Evaluation
IHME Visualizations library: http://www.healthdata.org/results/data-visualizations
Started with 4.2 TB, with goal to go to 30TB of data
23. Customer Use Case -1
Industry: healthcare (Medicaid)
Data: surveys
Use case: decision support system
Details:
• Identify trends and patterns
• Determine population cohorts
• Predict health outcomes
• Anticipate funding / capacity
• Recommend intervention
Can’t do complex queries on current
hardware with Oracle and snowflake
schemas
Limited to optimizing for simple, known
queries (2-3 columns)
Replaced with ColumnStore
> a single table
> 2.5 million rows, 248 columns >
complex, ad-hoc queries
> query 20+ columns in seconds
24. Customer Use Case - 2
Industry: biotechnology (genetics)
Data: genotypes
Use case: genetic profiling
Details:
• Find genetic mates (beef and dairy)
• Predict meat production (pork)
• Gene/DNA analysis
Had to convert to CSV files and schedule
import jobs (cron)
Always receiving new genetic data
Migrated to data adapter (Python)
> streamline import process
> remove steps / possible error
> remove delays
> import data on demand
> immediate customer access
25. Customer Use Case - 3
Industry:Mobile text/call app
Data: call and text logs
Use case: Mobile app use analytics
Details:
• 30 million text and 3 million phone call
per day
• 1.5 billion rows of logs per day
• The text and call volume rate will
continue to grow
InnoDB backend hit the scale limit of
6TB and it requires lot of performance
tuning and index management
Migrated to MariaDB AX
> Able to process 24 month - 24TB vs
6 months limitation of InnoDB
> Same BI tools and client applications
worked with MariaDB AX seamlessly