Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
IPC Global Big Data To Decision Solution Overview
1. Enterprise Intelligence
Enterprise Intelligence
Big Data to Decisions
Pete Zybrick
Enterprise Solutions Architect
Cloudera Certified Developer
for Apache Hadoop
IPC Global
T: 973-214-8820
pete.zybrick@ipc-global.com
ipc-global.com
2. Objectives
• Big Data to Decisions
• Cloudera CDH5
• AWS Elastic MapReduce
• Demonstrate End-to-End Example
• Overview of IPC Global Tools and Processes
3. Topics
• Data Source
• Randomly generated SiteCatalyst data
(500K rows/day, 7 days, 554 columns)
• 2% Random Error Injection
• Process the Data: Hadoop
• Cloudera: Oozie job specification, MapReduce program
• AWS: EMR program
• Both call the same Hadoop Driver and Mapper programs
• Store Big Data: HDFS, Redshift, Delimited
• Selective Big Data Reduction
• Direct from Big Data: QVD
• Data Warehouse: MySQL
• Robust Application(s): QlikView
4. Process Flow
Live Data
Cloudera CDH5
AWS Elastic MapReduce
QlikView
Input Files
Test Data
Generator
Impala
Redshift
ToImpala
ToRedshift
DailyDW
Data
Warehouse
DailyDW
DailyQVD
QVD Files
DailyQVD
MapReduce
TSV Files
HDFS
MapReduce
TSV Files
Oozie Job
EMR Job
Power
Data
Users
Corp DB
5. IPC AWS Infrastructure
• Capabilities
• Cloudera CDH5 Cluster – ClouderaManager + Managed Nodes
• AWS Elastic MapReduce – Dynamic launch of Hadoop cluster – Run Till
Done
• Database Servers – RDS, MySQL, On Demand, QLIK
• VPN Integration with Client Network
• Rapid POC and Test Turnaround
6. Development / Testing
• Big Data Test Generator
• Economically Generate Millions Of Rows Of Test Data Within Hours
• Runs as Cluster on AWS EC2 instances – Parallel Generation
• Configurable Random Data Types
• AWS Tools – Component Library
• Encapsulate Complex Mechanisms into Basic Calls
• Consistent Error Recovery
• Consistent Security Model
• Library of Demonstration Programs
• Working with Amazon SA’s to Validate and Enhance
7. Summary
• On Premise, AWS, Hybrid - Rapid Turnaround
• Early Adopter – BI, AWS, Big Data
• Investing in Data to Decisions Pipeline
• Next Steps…
8. Enterprise Intelligence
Enterprise Intelligence
Big Data to
Decisions.
Pete Zybrick
Enterprise Solutions Architect
Cloudera Certified Developer
for Apache Hadoop
IPC Global
T: 973-214-8820
pete.zybrick@ipc-global.com
ipc-global.com