5. 5
From BI to Advanced Analytics
What happened?
When? And
Where?
What will happen?
How and why did
it happen?
Time
Data Size
Facts Interpretations
How can we do
better?
6. 6
Advanced Analytics that Saves Us Money
• Customer churn
analysis model
• Integrated customer
support and services
• Fraud detection
6
7. 7
Advanced Analytics that Makes Us Money
• Product
recommendation
engines
• Location-based real-
time offers
• Target-based pricing
strategy
7
$
9. 9
Enterprise Pressures - Questions
9
Marketing Operations
t
value$
Total Market
Sales
Known Market Customers“We want to know what our
customer do on-line and in
our stored. How can we
combine data from separate
analytics silos to understand
& serve them better?”
“How can we reduce stock-
outs & ensure products are in
the right stores at the right
time? Can we combine data
from our carriers with in-
store historical data from
thousands of stores?
“Theft, or ‘shrinkage’ in our
stores is on the increase –
can we combine POS data
with video surveillance to
reduce it without impacting
customer service
negatively?”
10. 10
Enterprise Pressures - Questions
10
Marketing Operations
t
value$
Total Market
Sales
Known Market Customers“We want to know what our
customer do on-line and in
our stored. How can we
combine data from separate
analytics silos to understand
& serve them better?”
“How can we reduce stock-
outs & ensure products are in
the right stores at the right
time? Can we combine data
from our carriers with in-
store historical data from
thousands of stores?
“Theft, or ‘shrinkage’ in our
stores is on the increase –
can we combine POS data
with video surveillance to
reduce it without impacting
customer service
negatively?”
Data Products
12. 12
Data Product Value
Cost to implement (in time, budget, people, tools)
V
A
L
U
E
5
6
7
8
2
3 4 sensor data
Multi-source – Fuzzy Value
operational data
1
$500K $1M
$500K
$1M
13. 13
Data Product + Risk
Cost to implement (in time, budget, people, tools)
V
A
L
U
E
5
sensor data
Known Value
Single-Source
1
4
7
low
medium
high
13
3
Multi-source – Fuzzy Value
6
8
2
$500K $1M
$500K
$1M
Risks
14. 14
“I’m sick of waiting for my
data, I’m going to make my
own copy.”
“I need to make sure the DW
is secure & compliant for the
mission critical reports.”
Impact of Status Quo
“We don’t have the information
we need to answer key
business questions.”
DBA/DW
Admins
Executives
Data
Scientists
15. 15
What if?
15
Cost to implement (in time, budget, people, tools)
V
A
L
U
E
5
3
1
4
6
8
7
2
$500K $1M
$500K
$1M
low
medium
high
Risks
17. 17
Traditional Advanced Analytics Process
Time-to-Insight
Project
Definition
Data
Preparation
Exploratory
Analytics
Operational
Analytics
Model
Creation
Model
Evaluation
Deploy
Model
Problem
ID
Data Sampling
Data Access Request
& Discovery
Data Transformation
20. 20
Analytics Process with EDH
Project
Definition
Data
Preparation
Exploratory
Analytics
Operational
Analytics
Model
Creation
Model
Evaluation
Data
Sampling
Data
Access
Request
&
Discovery
Deploy
Model
Problem
ID
Deliver Insights Sooner
Data
Transfor-
mation
22. 22
Step 1 : Collect all Data
22
Marketing
Market Data
System
Information
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECURE
Marketing
23. 23
Step 2 : Create Derived Datasets
23
Marketing
BATCH
PROCESSING
3RD PARTY
APPS
Data Set 1 Data Set 2
24. 24
Step 2 : Create Derived Datasets
24
Marketing
BATCH
PROCESSING
3RD PARTY
APPS
Data Set 1 Data Set 2
25. 25
Step 3 : Data Analysts
25
Marketing
Data Set 1 Data Set 2
ANALYTIC
SQL
SEARCH
ENGINE
26. 26
Step 4 : Analytics
26
Marketing
Data Set 1 Data Set 2
MACHINE
LEARNING
STREAM
PROCESSING
3RD PARTY
APPS
Clustering Recommender Regression
27. 27
Step 4 Cont: Analytics + Data Together
27
Data Set 1 Data Set 2
Old Way
SAS/R
JDBC-SELECT 10%
MACHINE
LEARNING
SAS+/R+
(ORYX)
ALGORITHM
28. 28
Cloudera EDH for Analytics
BATCH
PROCESSING
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
3RD PARTY
APPS
WORKLOAD MANAGEMENT
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECURE
DATA
MANAGEMENT
SYSTEM
MANAGEMENT
Filesystem Online NoSQL
29. 29
• Acquire necessary
information sooner to
make critical business
decisions
Executives
Business Value Delivered
• Support both
reporting and
analytics needs
• Save resources with
shared security and
management
DBA/DW
Admins
• Acquire data
necessary for projects
• Develop
analysis/models with
better lift faster
• Share data sets to
empower others
Data Scientists
31. 31
Monsanto can automate data-driven R&D
decisions to reduce time to market from
years to months.
Ask Bigger Questions:
How do we feed the world?
32. 32
Monsanto feeds our growing, global population
The Challenge:
• 1,000+ research scientists developing products
in silos
• Data processing bottleneck slows development
• Time to market for new product is 5-10 years
The Solution:
• Cloudera Enterprise + Search + Impala: PB-scale
platform for single view of all R&D data
• Integration: Exadata, spatial awareness &
visualization
• Scientists directly access CDH; Navigator offers
auditing & access control
Monsanto can automate data-
driven R&D decisions to reduce time
to market to months from years.
33. 33
Patterns and Predictions analyzes mobile data and social
networking text for real-time identification of risk
factors.
Ask Bigger Questions:
How can we prevent veteran suicide?
34. 34
Patterns and Predictions aids suicide prevention
The Challenge:
• Suicide rates among veterans are roughly
double that of general US adults
• Military efforts struggle to understand risk
factors
The Solution:
• Suicide risk predictive solution built on Cloudera
+ Attivio
• Analyzes veterans’ mobile & social data for real-
time identification of risk factors
• Integrating Cloudera Search + Impala to simplify
environment
The Durkheim Project predicts
suicide risk with statistical
significance (65%+ accuracy).
The requirements coming from executive: To be able to answer key business questions while run operational reports has created a strained situation between the Data Scientists and the DBA/DW Admins. DBA/DW Admins are forced to choose between DW is secure and compliant, and meeting Data Scientists’ requirements for accessing the data they need when they need them.
Common misconception is that data science work centers around model development. While model development is crucial, most of the effort and time are spent on data preparation. This is due to that in the traditional world of analytics, there are a lot of data movement, which is both time-consuming and limiting for the things data scientists can do.
And so if we come back to look at how this solution now affects the three groups of people in an enterprise, who are closest to the data, we quickly see that:For Data ScientistHe is able to acquire data necessary for the project very quickly, without the need to create rogue data martsBecause he can now use all the data very quickly, he can develop models with much better liftOnce he has the insights, he can share the data set to empower other usersFor the DW administratorHe can now support both the running of mission critical reports in his DW, while fulfilling the need for data from the data scientistsAnd he can save resources and time, now all the data are in one centralized location with unified security and management,For the Executive She can finally get the overall report that she needs on regular basis, but still able to gain competitive edge, whether it’s decreasing costs/risks or increasing revenue, with the insights gained from the use of all the data