Making the leap to BI on Hadoop by Mariani, dave @ atscale
1. Making the leap to BI on Hadoop
Predictive Analytics & Business Insights 2014
November 19, 2014
David P. Mariani
CEO
AtScale, Inc.
2. 2
THE TRUTH
ABOUT DATA
2
“We think only 3% of the
potentially useful data is tagged,
and even less is analyzed.”
Source: IDC Predictions 2013: Big Data, IDC
“90% of the data in the world
today has been created in
the last two years”
Source: IBM
7. 7
INPUT DATA
ETL
MART MART MART
QUERY ENGINE
ANALYSIS TOOLS
DATA
WAREHOUSE
What’s Wrong with this Picture
Highly complex
Lots of people & skillsets
Multiple copies of data
Stale data
Rigid schema
Tough to change
Write Many StructuredEarly Transformation
8. 8
It Takes an Army
BI Engineer
Design Reports/Dashboards
ETL Engineer
Automate Cube Load
BI Engineer
Design Cube
DBA
Automate Data Load
ETL Engineer
Write ETL Code
DBA
Create Tables
Data Warehouse Architect
Design Star Schema
SAN/NAS Engineer
Define Storage Architecture
11. 11
Data Management Approaches
INPUT DATA
ETL
MART MART MART
QUERY ENGINE
ANALYSIS TOOLS
DATA
WAREHOUSE
Traditional Approach New Approach
INPUT DATA
ANALYSIS TOOLS
HADOOP
12. Time for a New Approach
VS
Write Once Semi-StructuredLate Transformation
✔ ✔ ✔
13. 13
Not This, That
BI Engineer
Run Queries/Create
Reports
Hadoop Engineer
Create EXTERNAL Tables
Hadoop Engineer
Define location to store
files
BI Engineer
Design Reports/Dashboards
ETL Engineer
Automate Cube Load
BI Engineer
Design Cube
DBA
Automate Data Load
ETL Engineer
Write ETL Code
DBA
Create Tables
Data Warehouse Architect
Design Star Schema
SAN/NAS Engineer
Define Storage Architecture
VS
29. 29
Summary: The Do’s & Don’ts
Capture data “as is” Pre-aggregate data
Apply schema on read Force schema on load
Land new data on Hadoop Land new data on relational
DBs
Create a data warehouse Create data marts
Leverage open source engines Invest in proprietary databases
Do Don’t