SlideShare une entreprise Scribd logo
1  sur  8
Enterprise Intelligence
Enterprise Intelligence
Big Data to Decisions
Pete Zybrick
Enterprise Solutions Architect
Cloudera Certified Developer
for Apache Hadoop
IPC Global
T: 973-214-8820
pete.zybrick@ipc-global.com
ipc-global.com
Objectives
• Big Data to Decisions
• Cloudera CDH5
• AWS Elastic MapReduce
• Demonstrate End-to-End Example
• Overview of IPC Global Tools and Processes
Topics
• Data Source
• Randomly generated SiteCatalyst data
(500K rows/day, 7 days, 554 columns)
• 2% Random Error Injection
• Process the Data: Hadoop
• Cloudera: Oozie job specification, MapReduce program
• AWS: EMR program
• Both call the same Hadoop Driver and Mapper programs
• Store Big Data: HDFS, Redshift, Delimited
• Selective Big Data Reduction
• Direct from Big Data: QVD
• Data Warehouse: MySQL
• Robust Application(s): QlikView
Process Flow
Live Data
Cloudera CDH5
AWS Elastic MapReduce
QlikView
Input Files
Test Data
Generator
Impala
Redshift
ToImpala
ToRedshift
DailyDW
Data
Warehouse
DailyDW
DailyQVD
QVD Files
DailyQVD
MapReduce
TSV Files
HDFS
MapReduce
TSV Files
Oozie Job
EMR Job
Power
Data
Users
Corp DB
IPC AWS Infrastructure
• Capabilities
• Cloudera CDH5 Cluster – ClouderaManager + Managed Nodes
• AWS Elastic MapReduce – Dynamic launch of Hadoop cluster – Run Till
Done
• Database Servers – RDS, MySQL, On Demand, QLIK
• VPN Integration with Client Network
• Rapid POC and Test Turnaround
Development / Testing
• Big Data Test Generator
• Economically Generate Millions Of Rows Of Test Data Within Hours
• Runs as Cluster on AWS EC2 instances – Parallel Generation
• Configurable Random Data Types
• AWS Tools – Component Library
• Encapsulate Complex Mechanisms into Basic Calls
• Consistent Error Recovery
• Consistent Security Model
• Library of Demonstration Programs
• Working with Amazon SA’s to Validate and Enhance
Summary
• On Premise, AWS, Hybrid - Rapid Turnaround
• Early Adopter – BI, AWS, Big Data
• Investing in Data to Decisions Pipeline
• Next Steps…
Enterprise Intelligence
Enterprise Intelligence
Big Data to
Decisions.
Pete Zybrick
Enterprise Solutions Architect
Cloudera Certified Developer
for Apache Hadoop
IPC Global
T: 973-214-8820
pete.zybrick@ipc-global.com
ipc-global.com

Contenu connexe

Tendances

InfoTrack: Creating a single source of truth with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic StackInfoTrack: Creating a single source of truth with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic StackElasticsearch
 
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...★ Akshay Surve
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackElasticsearch
 
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1DataStax
 
Managing Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack TroveManaging Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack TroveTesora
 
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on Azure
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on AzureBuild 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on Azure
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on AzureWindows Developer
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBOkbajda
 
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScyllaDB
 
Load data from Office365 to Snowflake in minutes
Load data from Office365 to Snowflake in minutesLoad data from Office365 to Snowflake in minutes
Load data from Office365 to Snowflake in minutessyed_javed
 
Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)★ Akshay Surve
 
The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data StoreRommel Garcia
 
SnapLogic Live: Powering Cloud Analytics
SnapLogic Live: Powering Cloud AnalyticsSnapLogic Live: Powering Cloud Analytics
SnapLogic Live: Powering Cloud AnalyticsSnapLogic
 
Logging, Metrics, and APM: The Operations Trifecta
Logging, Metrics, and APM: The Operations TrifectaLogging, Metrics, and APM: The Operations Trifecta
Logging, Metrics, and APM: The Operations TrifectaElasticsearch
 
Building Software to Scale
Building Software to Scale Building Software to Scale
Building Software to Scale SingleStore
 
Accumulo Summit 2014: Accumulo with Distributed SQL queries
Accumulo Summit 2014: Accumulo with Distributed SQL queriesAccumulo Summit 2014: Accumulo with Distributed SQL queries
Accumulo Summit 2014: Accumulo with Distributed SQL queriesAccumulo Summit
 
Building an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutesBuilding an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutesClaudiu Barbura
 
Quix presto ide, presto summit IL
Quix presto ide, presto summit ILQuix presto ide, presto summit IL
Quix presto ide, presto summit ILOri Reshef
 
QMeeting 2018 - Como integrar qlik e cloudera
QMeeting 2018 - Como integrar qlik e clouderaQMeeting 2018 - Como integrar qlik e cloudera
QMeeting 2018 - Como integrar qlik e clouderaRoberto Oliveira
 
Bridging the Completeness of Big Data on Databricks
Bridging the Completeness of Big Data on DatabricksBridging the Completeness of Big Data on Databricks
Bridging the Completeness of Big Data on DatabricksDatabricks
 
James Corcoran, Head of Engineering EMEA, First Derivatives, "Simplifying Bi...
James Corcoran, Head of Engineering EMEA, First Derivatives,  "Simplifying Bi...James Corcoran, Head of Engineering EMEA, First Derivatives,  "Simplifying Bi...
James Corcoran, Head of Engineering EMEA, First Derivatives, "Simplifying Bi...Dataconomy Media
 

Tendances (20)

InfoTrack: Creating a single source of truth with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic StackInfoTrack: Creating a single source of truth with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic Stack
 
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic Stack
 
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
 
Managing Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack TroveManaging Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack Trove
 
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on Azure
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on AzureBuild 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on Azure
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on Azure
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
 
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
 
Load data from Office365 to Snowflake in minutes
Load data from Office365 to Snowflake in minutesLoad data from Office365 to Snowflake in minutes
Load data from Office365 to Snowflake in minutes
 
Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)
 
The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data Store
 
SnapLogic Live: Powering Cloud Analytics
SnapLogic Live: Powering Cloud AnalyticsSnapLogic Live: Powering Cloud Analytics
SnapLogic Live: Powering Cloud Analytics
 
Logging, Metrics, and APM: The Operations Trifecta
Logging, Metrics, and APM: The Operations TrifectaLogging, Metrics, and APM: The Operations Trifecta
Logging, Metrics, and APM: The Operations Trifecta
 
Building Software to Scale
Building Software to Scale Building Software to Scale
Building Software to Scale
 
Accumulo Summit 2014: Accumulo with Distributed SQL queries
Accumulo Summit 2014: Accumulo with Distributed SQL queriesAccumulo Summit 2014: Accumulo with Distributed SQL queries
Accumulo Summit 2014: Accumulo with Distributed SQL queries
 
Building an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutesBuilding an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutes
 
Quix presto ide, presto summit IL
Quix presto ide, presto summit ILQuix presto ide, presto summit IL
Quix presto ide, presto summit IL
 
QMeeting 2018 - Como integrar qlik e cloudera
QMeeting 2018 - Como integrar qlik e clouderaQMeeting 2018 - Como integrar qlik e cloudera
QMeeting 2018 - Como integrar qlik e cloudera
 
Bridging the Completeness of Big Data on Databricks
Bridging the Completeness of Big Data on DatabricksBridging the Completeness of Big Data on Databricks
Bridging the Completeness of Big Data on Databricks
 
James Corcoran, Head of Engineering EMEA, First Derivatives, "Simplifying Bi...
James Corcoran, Head of Engineering EMEA, First Derivatives,  "Simplifying Bi...James Corcoran, Head of Engineering EMEA, First Derivatives,  "Simplifying Bi...
James Corcoran, Head of Engineering EMEA, First Derivatives, "Simplifying Bi...
 

En vedette

Building Composable Serverless Apps with IOpipe
Building Composable Serverless Apps with IOpipe Building Composable Serverless Apps with IOpipe
Building Composable Serverless Apps with IOpipe Erica Windisch
 
How we built a job board in one week with JHipster - @KileNiklawski @IpponUSA
How we built a job board in one week with JHipster - @KileNiklawski @IpponUSAHow we built a job board in one week with JHipster - @KileNiklawski @IpponUSA
How we built a job board in one week with JHipster - @KileNiklawski @IpponUSAKile Niklawski
 
Hsp bi f2
Hsp bi f2Hsp bi f2
Hsp bi f2zulzaxx
 
Myths of Shared Decision Making - ISDM 2011 Maastricht, Netherlands
Myths of Shared Decision Making - ISDM 2011 Maastricht, NetherlandsMyths of Shared Decision Making - ISDM 2011 Maastricht, Netherlands
Myths of Shared Decision Making - ISDM 2011 Maastricht, NetherlandsVictor Montori
 
ADI: Healthcare In The Digital Age
ADI: Healthcare In The Digital AgeADI: Healthcare In The Digital Age
ADI: Healthcare In The Digital AgeAdobe
 
Creating the Perfect Healthcare Digital Marketing Campaign
Creating the Perfect Healthcare Digital Marketing CampaignCreating the Perfect Healthcare Digital Marketing Campaign
Creating the Perfect Healthcare Digital Marketing CampaignPyxl
 
Ops for NoOps - Operational Challenges for Serverless Apps
Ops for NoOps - Operational Challenges for Serverless AppsOps for NoOps - Operational Challenges for Serverless Apps
Ops for NoOps - Operational Challenges for Serverless AppsErica Windisch
 
What’s next in healthcare digital marketing?
What’s next in healthcare digital marketing?What’s next in healthcare digital marketing?
What’s next in healthcare digital marketing?White Rhino
 
Essay for weak students
Essay for weak studentsEssay for weak students
Essay for weak studentsMadam Mila
 
Maritime Robotics
Maritime RoboticsMaritime Robotics
Maritime RoboticsICSA, LLC
 

En vedette (10)

Building Composable Serverless Apps with IOpipe
Building Composable Serverless Apps with IOpipe Building Composable Serverless Apps with IOpipe
Building Composable Serverless Apps with IOpipe
 
How we built a job board in one week with JHipster - @KileNiklawski @IpponUSA
How we built a job board in one week with JHipster - @KileNiklawski @IpponUSAHow we built a job board in one week with JHipster - @KileNiklawski @IpponUSA
How we built a job board in one week with JHipster - @KileNiklawski @IpponUSA
 
Hsp bi f2
Hsp bi f2Hsp bi f2
Hsp bi f2
 
Myths of Shared Decision Making - ISDM 2011 Maastricht, Netherlands
Myths of Shared Decision Making - ISDM 2011 Maastricht, NetherlandsMyths of Shared Decision Making - ISDM 2011 Maastricht, Netherlands
Myths of Shared Decision Making - ISDM 2011 Maastricht, Netherlands
 
ADI: Healthcare In The Digital Age
ADI: Healthcare In The Digital AgeADI: Healthcare In The Digital Age
ADI: Healthcare In The Digital Age
 
Creating the Perfect Healthcare Digital Marketing Campaign
Creating the Perfect Healthcare Digital Marketing CampaignCreating the Perfect Healthcare Digital Marketing Campaign
Creating the Perfect Healthcare Digital Marketing Campaign
 
Ops for NoOps - Operational Challenges for Serverless Apps
Ops for NoOps - Operational Challenges for Serverless AppsOps for NoOps - Operational Challenges for Serverless Apps
Ops for NoOps - Operational Challenges for Serverless Apps
 
What’s next in healthcare digital marketing?
What’s next in healthcare digital marketing?What’s next in healthcare digital marketing?
What’s next in healthcare digital marketing?
 
Essay for weak students
Essay for weak studentsEssay for weak students
Essay for weak students
 
Maritime Robotics
Maritime RoboticsMaritime Robotics
Maritime Robotics
 

Similaire à IPC Global Big Data To Decision Solution Overview

Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectSoftServe
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experienceVitaliy Bashun
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxthando80
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSILuke Han
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
 
Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Anthony Potappel
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Web Services
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformLynn Langit
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
 

Similaire à IPC Global Big Data To Decision Solution Overview (20)

Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Big Trends in Big Data
Big Trends in Big DataBig Trends in Big Data
Big Trends in Big Data
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 

IPC Global Big Data To Decision Solution Overview

  • 1. Enterprise Intelligence Enterprise Intelligence Big Data to Decisions Pete Zybrick Enterprise Solutions Architect Cloudera Certified Developer for Apache Hadoop IPC Global T: 973-214-8820 pete.zybrick@ipc-global.com ipc-global.com
  • 2. Objectives • Big Data to Decisions • Cloudera CDH5 • AWS Elastic MapReduce • Demonstrate End-to-End Example • Overview of IPC Global Tools and Processes
  • 3. Topics • Data Source • Randomly generated SiteCatalyst data (500K rows/day, 7 days, 554 columns) • 2% Random Error Injection • Process the Data: Hadoop • Cloudera: Oozie job specification, MapReduce program • AWS: EMR program • Both call the same Hadoop Driver and Mapper programs • Store Big Data: HDFS, Redshift, Delimited • Selective Big Data Reduction • Direct from Big Data: QVD • Data Warehouse: MySQL • Robust Application(s): QlikView
  • 4. Process Flow Live Data Cloudera CDH5 AWS Elastic MapReduce QlikView Input Files Test Data Generator Impala Redshift ToImpala ToRedshift DailyDW Data Warehouse DailyDW DailyQVD QVD Files DailyQVD MapReduce TSV Files HDFS MapReduce TSV Files Oozie Job EMR Job Power Data Users Corp DB
  • 5. IPC AWS Infrastructure • Capabilities • Cloudera CDH5 Cluster – ClouderaManager + Managed Nodes • AWS Elastic MapReduce – Dynamic launch of Hadoop cluster – Run Till Done • Database Servers – RDS, MySQL, On Demand, QLIK • VPN Integration with Client Network • Rapid POC and Test Turnaround
  • 6. Development / Testing • Big Data Test Generator • Economically Generate Millions Of Rows Of Test Data Within Hours • Runs as Cluster on AWS EC2 instances – Parallel Generation • Configurable Random Data Types • AWS Tools – Component Library • Encapsulate Complex Mechanisms into Basic Calls • Consistent Error Recovery • Consistent Security Model • Library of Demonstration Programs • Working with Amazon SA’s to Validate and Enhance
  • 7. Summary • On Premise, AWS, Hybrid - Rapid Turnaround • Early Adopter – BI, AWS, Big Data • Investing in Data to Decisions Pipeline • Next Steps…
  • 8. Enterprise Intelligence Enterprise Intelligence Big Data to Decisions. Pete Zybrick Enterprise Solutions Architect Cloudera Certified Developer for Apache Hadoop IPC Global T: 973-214-8820 pete.zybrick@ipc-global.com ipc-global.com