Streamlining Python Development: A Guide to a Modern Project Setup
2011 - TDWI Big Data Forum - The New Analytics
1. Hadoop&
The New Analytics
Casey Kiernan
Sr. Director / Data Architecture - Shopzilla.com
November 1, 2011
2. Agenda
The New “Data”
The New Business Model
The New Analytic Scenarios
The New Analytic Architectures
The New Analytic Technologies
And, Yes… The New Data-Center
2
3. SERVICES • SOA • JSON • AVRO • APPLICATIONS • HTML • JAVA • C# • THE CLOUD • HADOOP • OLAP
PYTHON • SITES • TENANT • ORG • SSN
FINANCE • ID • CUSTOMER • EXPERIAN
SQL SERVER • ORACLE • UNIX • SUBVERSION • COMPLIANCE • SECURITY • SALESFORCE • MYSQL
The World as I See it
4. My Mountain Bike as a Data Platform
Data Collection
Heart Rate
Data Collection
Altitude
Data Collection Temperature
Speed / Trip Miles Time Guidance
Performance
Rate of Climb
Calories Burned
Miles Obtained
Total Climbed
Elapsed Time
Current,
Average,
Max Values
Data Collection
Cadence / RPM
Data Architecture - on a Local Wireless Network (ANT+ Protocol)
5. “Business” Analytics
BUSINESS INTELLIGENCE.
DATA WAREHOUSE/OLAP.
OLTP DATA.
What are our most profitable Movie titles?
6. “Business” Analytics
What did Happen? What will Happen?
Operational Reporting Tactical Analytics Strategic
Months WeeksWeeks Months Years
6
7. “Personal” Analytics
SELF-SERVICE.
GUIDANCE.
BEHAVIOURS.
What Movie should I watch tonight?
8. “Personal” Analytics
What is Happening NOW?
What did Happen? What will Happen?
Historical Behaviors Tactical Analytics Strategic
Months WeeksWeeks Months Years
8
12. “Business” Analytics
OLTP App
Staging
Data OLAP / Business
Orders App Warehouse Analyst
Reports
OLTP to OLAP Mapping
FIN App
What are our most profitable Movie titles?
12
13. “Personal” Analytics
End User
Application
Data Analytics
What Movie should I watch tonight?
13
14. End-User Experience
Browser, Tablet, Self-Service Application
Mobile,…
Personalization, Personalized
Preferences, State Recommendations
App Persistence Analytics
Persistence/Analytics “State” Persistence “Read” Performance
Big Data
Behaviors / “Write” Performance
“Personal Analytics” Data Architecture
14
16. The New Technology Stack
Specialization / Individual Scalability / Late-Binding - for each component
Technology Data Warehousing New Analytics
Analytics OLAP OLAP + Open-Source
Data Movement ETL Tool MapReduce
SQL RDBMS Hive
Schema Metadata RDBMS JSON / AVRO
Indexing (Readers) RDBMS HBase
RI RDBMS Application Logic
App Store (Objects) RDBMS Key/Value - Cassandra,…
Schema / Columns RDBMS Column Families / Dynamic
Logs (Writers) RDBMS Scalable - Hadoop
Infrastructure Data-Center Cloud
16
17. End-User Experience
Browser, Tablet, Self-Service Application
Mobile,…
Personalization, Personalized
Preferences, State Recommendations
App Persistence Analytics
Persistence/Analytics Cassandra (JSON) Hbase (Column-Families)
Data-Center or Cloud
MapReduce
Big Data
Hadoop (AVRO) SQL
Hive
Specialization of Data Technologies
17
18. Personal Analytics + Business Intelligence
App
Staging Data OLAP / Business
OLTP App Warehouse Analyst
Reports
OLTP to OLAP Mapping
OLTP App
18
19. Contact Information
If you have further questions or comments:
Casey Kiernan
Sr. Director / Data Architecture
Shopzilla.com
casey.kiernan@hotmail.com
BLOG: www.the-data-platform.com
19