SlideShare a Scribd company logo
1 of 19
Shaun Connolly
Hortonworks VP Strategy
@shaunconnolly
Hadoop Powers Modern
Enterprise Data Architectures
By 2015, Organizations that
Build a Modern Information
Management System Will
Outperform their Peers
Financially by 20 Percent.
– Gartner, Mark Beyer, “Information Management in the 21st Century”
New Sources
(sentiment, clickstream, geo, sensor, …)
Traditional Data ArchitectureAPPLICATIO
NS
DATASYSTEMS
TRADITIONAL REPOS
RDBMS EDW MPP
DATA
SOURCES
OLTP, POS SYSTEMS
Business
Analytics
Custom
Applications
Packaged
Applications
Pressured
TRADITIONAL REPOS
RDBMS EDW MPP
OPERATIONAL
TOOLS
MANAGE &
MONITOR
DEV & DATA
TOOLS
BUILD & TEST
Traditional Sources
(RDBMS, OLTP, OLAP)
PressuredTraditional Data Architecture
Source: IDC
New Sources
(sentiment, clickstream, geo, sensor, …)
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
New Sources
(sentiment, clickstream, geo, sensor, …)
Modern Data Architecture EnabledAPPLICATIONSDATASYSTEMS
DATA
SOURCES
OLTP, POS
SYSTEMS
Business
Analytics
Custom
Applications
Packaged
Applications
TRADITIONAL REPOS
RDBMS EDW MPP
Traditional Sources
(RDBMS, OLTP, OLAP)
MANAGE &
MONITOR
OPERATIONAL
TOOLS
BUILD & TEST
DEV & DATA
TOOLS
ENTERPRISE
HADOOP PLATFORM
Agile “Data Lake” Solution Architecture
Capture All Data Process & Structure
1 2 Distribute Results
3 Feedback & Retain
4
Dashboards,
Reports,
Visualization, …
Web, Mobile,
CRM, ERP,
Point of sale
Business
Transactions
& Interactions
Business
Intelligence
& Analytics
Classic Data
Integration & ETL
Logs & Text Data
Sentiment Data
Structured
DB Data
Clickstream Data
Geo & Tracking Data
Sensor & Machine Data
Enterprise
Hadoop
Platform
BATCH INTERACTIVE STREAMING GRAPH IN-MEMORY HPC MPIONLINE OTHER…
Key Requirement of a “Data Lake”
Store ALL DATA in one place…
…and Interact with that data in MULTIPLE WAYS
HDFS (Redundant, Reliable Storage)
Applications Run Natively IN Hadoop
BATCH
MapReduce
INTERACTIVE
Tez
STREAMING
Storm
GRAPH
Giraph
IN-MEMORY
Spark
HPC MPI
OpenMPI
ONLINE
HBase
OTHER…
ex. Search
YARN Takes Hadoop Beyond Batch
Applications run “IN” Hadoop versus “ON” Hadoop…
…with Predictable Performance and Quality of Service
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
Ex. SQL-IN-Hadoop with Apache Hive
Stinger Initiative
Focus Areas
Make Hive 100X Faster
Make Hive SQL Compliant HDFS2
YARN
HIVE
SQL
MAP
REDUCE
Business
Analytics
Custom
Apps
TEZ
Making Hadoop Enterprise Ready
OS/VM Cloud Appliance
Enterprise Hadoop Platform
PLATFORM
SERVICES
Enterprise Readiness
High Availability, Disaster
Recovery,Security and
Snapshots
OPERATIONAL
SERVICES
Manage & Operate
at Scale
DATA
SERVICES
Store, Process
and Access Data
CORE
Distributed
Storage & Processing
Mohit Saxena
VP & Technology Founder
Managing and Processing Data at Scale
and Across Datacenters
UA2
Ad Servers
Click Servers
Beacon Servers
Fraud Service
Global RTFB
LHR1
Ad Servers
Click Servers
Beacon Servers
UJ1
Ad Servers
Click Servers
Beacon Servers
Billing Service
Download Servers
HKG1
Ad Servers
Click Servers
Beacon Servers
UA2-Ruby
RAW Logs
UA2-Global
RAW Logs
LHR1-Emerald
RAW Logs
UJ1-Topaz
RAW Logs
HKG1-Opal
Summaries
InMobi contributed Apache Falcon
to address Hadoop
data lifecycle management
Innovate
Participate
Integrate
Many Communities Must Work As One
Open
Source
End
Users
Vendors
Ecosystem Completes the Puzzle
Data Systems
Applications, Business Tools, & Dev Tools
Infrastructure & Systems Management
Thank You to Our Sponsors
Data Systems
Applications, Business Tools, & Dev Tools
Infrastructure & Systems Management
Hadoop Wave ONE: Web-scale Batch Apps
time
relative%
customers
Customers want
solutions & convenience
Customers want
technology & performance
Source: Geoffrey Moore - Crossing the Chasm
2006 to 2012
Web-Scale
Batch Applications
Innovators,
technology
enthusiasts
Early
adopters,
visionaries
Early
majority,
pragmatists
Late
majority,
conservatives
Laggards,
Skeptics
TheCHASM
Customers want
solutions & convenience
Customers want
technology & performance
Hadoop Wave TWO: Broad Enterprise Apps
time
relative%
customers
Source: Geoffrey Moore - Crossing the Chasm
Innovators,
technology
enthusiasts
Early
adopters,
visionaries
Early
majority,
pragmatists
Late
majority,
conservatives
Laggards,
Skeptics
TheCHASM
2013 & Beyond
Batch, Interactive, Online,
Streaming, etc., etc.
Hadoop Powers Modern Enterprise Data Architectures

More Related Content

What's hot

A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data LakeVMware Tanzu
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformCaserta
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariHortonworks
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data LakeRobert Chong
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 

What's hot (20)

A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 

Viewers also liked

Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Hortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThomas Kelly, PMP
 
Big data connection overview by aibdp.org
Big data connection overview by aibdp.orgBig data connection overview by aibdp.org
Big data connection overview by aibdp.orgAIBDP
 
1 to 1 Presentation 2015
1 to 1 Presentation 20151 to 1 Presentation 2015
1 to 1 Presentation 2015James Puliatte
 
加速開發! 在Windows開發hadoop程式,直接運行 map/reduce
加速開發! 在Windows開發hadoop程式,直接運行 map/reduce加速開發! 在Windows開發hadoop程式,直接運行 map/reduce
加速開發! 在Windows開發hadoop程式,直接運行 map/reduceWei-Yu Chen
 
Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Wei-Yu Chen
 
Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Wei-Yu Chen
 
大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術Wei-Yu Chen
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Developing Hadoop strategy for your Enterprise
Developing Hadoop strategy for your EnterpriseDeveloping Hadoop strategy for your Enterprise
Developing Hadoop strategy for your EnterpriseAvkash Chauhan
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkDatabricks
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopDavid Yahalom
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Cloudera, Inc.
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 

Viewers also liked (20)

Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Big data connection overview by aibdp.org
Big data connection overview by aibdp.orgBig data connection overview by aibdp.org
Big data connection overview by aibdp.org
 
1 to 1 Presentation 2015
1 to 1 Presentation 20151 to 1 Presentation 2015
1 to 1 Presentation 2015
 
Hadoop hive
Hadoop hiveHadoop hive
Hadoop hive
 
加速開發! 在Windows開發hadoop程式,直接運行 map/reduce
加速開發! 在Windows開發hadoop程式,直接運行 map/reduce加速開發! 在Windows開發hadoop程式,直接運行 map/reduce
加速開發! 在Windows開發hadoop程式,直接運行 map/reduce
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來
 
Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系
 
大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Developing Hadoop strategy for your Enterprise
Developing Hadoop strategy for your EnterpriseDeveloping Hadoop strategy for your Enterprise
Developing Hadoop strategy for your Enterprise
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 

Similar to Hadoop Powers Modern Enterprise Data Architectures

Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, HortonworksHortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaSanjeev Kumar
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJDaniel Madrigal
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRushtempledf
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
How Experian increased insights with Hadoop
How Experian increased insights with HadoopHow Experian increased insights with Hadoop
How Experian increased insights with HadoopPrecisely
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azureApache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azureBrad Sarsfield
 

Similar to Hadoop Powers Modern Enterprise Data Architectures (20)

Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRush
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
How Experian increased insights with Hadoop
How Experian increased insights with HadoopHow Experian increased insights with Hadoop
How Experian increased insights with Hadoop
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azureApache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Hadoop Powers Modern Enterprise Data Architectures

  • 1. Shaun Connolly Hortonworks VP Strategy @shaunconnolly Hadoop Powers Modern Enterprise Data Architectures
  • 2. By 2015, Organizations that Build a Modern Information Management System Will Outperform their Peers Financially by 20 Percent. – Gartner, Mark Beyer, “Information Management in the 21st Century”
  • 3. New Sources (sentiment, clickstream, geo, sensor, …) Traditional Data ArchitectureAPPLICATIO NS DATASYSTEMS TRADITIONAL REPOS RDBMS EDW MPP DATA SOURCES OLTP, POS SYSTEMS Business Analytics Custom Applications Packaged Applications Pressured TRADITIONAL REPOS RDBMS EDW MPP OPERATIONAL TOOLS MANAGE & MONITOR DEV & DATA TOOLS BUILD & TEST Traditional Sources (RDBMS, OLTP, OLAP)
  • 4. PressuredTraditional Data Architecture Source: IDC New Sources (sentiment, clickstream, geo, sensor, …) 2.8 ZB in 2012 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020
  • 5. New Sources (sentiment, clickstream, geo, sensor, …) Modern Data Architecture EnabledAPPLICATIONSDATASYSTEMS DATA SOURCES OLTP, POS SYSTEMS Business Analytics Custom Applications Packaged Applications TRADITIONAL REPOS RDBMS EDW MPP Traditional Sources (RDBMS, OLTP, OLAP) MANAGE & MONITOR OPERATIONAL TOOLS BUILD & TEST DEV & DATA TOOLS ENTERPRISE HADOOP PLATFORM
  • 6. Agile “Data Lake” Solution Architecture Capture All Data Process & Structure 1 2 Distribute Results 3 Feedback & Retain 4 Dashboards, Reports, Visualization, … Web, Mobile, CRM, ERP, Point of sale Business Transactions & Interactions Business Intelligence & Analytics Classic Data Integration & ETL Logs & Text Data Sentiment Data Structured DB Data Clickstream Data Geo & Tracking Data Sensor & Machine Data Enterprise Hadoop Platform
  • 7. BATCH INTERACTIVE STREAMING GRAPH IN-MEMORY HPC MPIONLINE OTHER… Key Requirement of a “Data Lake” Store ALL DATA in one place… …and Interact with that data in MULTIPLE WAYS HDFS (Redundant, Reliable Storage)
  • 8. Applications Run Natively IN Hadoop BATCH MapReduce INTERACTIVE Tez STREAMING Storm GRAPH Giraph IN-MEMORY Spark HPC MPI OpenMPI ONLINE HBase OTHER… ex. Search YARN Takes Hadoop Beyond Batch Applications run “IN” Hadoop versus “ON” Hadoop… …with Predictable Performance and Quality of Service HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management)
  • 9. Ex. SQL-IN-Hadoop with Apache Hive Stinger Initiative Focus Areas Make Hive 100X Faster Make Hive SQL Compliant HDFS2 YARN HIVE SQL MAP REDUCE Business Analytics Custom Apps TEZ
  • 10. Making Hadoop Enterprise Ready OS/VM Cloud Appliance Enterprise Hadoop Platform PLATFORM SERVICES Enterprise Readiness High Availability, Disaster Recovery,Security and Snapshots OPERATIONAL SERVICES Manage & Operate at Scale DATA SERVICES Store, Process and Access Data CORE Distributed Storage & Processing
  • 11. Mohit Saxena VP & Technology Founder
  • 12. Managing and Processing Data at Scale and Across Datacenters UA2 Ad Servers Click Servers Beacon Servers Fraud Service Global RTFB LHR1 Ad Servers Click Servers Beacon Servers UJ1 Ad Servers Click Servers Beacon Servers Billing Service Download Servers HKG1 Ad Servers Click Servers Beacon Servers UA2-Ruby RAW Logs UA2-Global RAW Logs LHR1-Emerald RAW Logs UJ1-Topaz RAW Logs HKG1-Opal Summaries
  • 13. InMobi contributed Apache Falcon to address Hadoop data lifecycle management
  • 14. Innovate Participate Integrate Many Communities Must Work As One Open Source End Users Vendors
  • 15. Ecosystem Completes the Puzzle Data Systems Applications, Business Tools, & Dev Tools Infrastructure & Systems Management
  • 16. Thank You to Our Sponsors Data Systems Applications, Business Tools, & Dev Tools Infrastructure & Systems Management
  • 17. Hadoop Wave ONE: Web-scale Batch Apps time relative% customers Customers want solutions & convenience Customers want technology & performance Source: Geoffrey Moore - Crossing the Chasm 2006 to 2012 Web-Scale Batch Applications Innovators, technology enthusiasts Early adopters, visionaries Early majority, pragmatists Late majority, conservatives Laggards, Skeptics TheCHASM
  • 18. Customers want solutions & convenience Customers want technology & performance Hadoop Wave TWO: Broad Enterprise Apps time relative% customers Source: Geoffrey Moore - Crossing the Chasm Innovators, technology enthusiasts Early adopters, visionaries Early majority, pragmatists Late majority, conservatives Laggards, Skeptics TheCHASM 2013 & Beyond Batch, Interactive, Online, Streaming, etc., etc.

Editor's Notes

  1. Thank you all for attending Hadoop Summit! For those who have attended previoiusHadoop Summits: Welcome back!For those new to Hadoop Summit: Welcome to the Hadoop herd!I’d like to spend the next 30 minutes focused on Hadoop’s opportunity to power modern enterprise data architectures. I’ve seen a lot of open source technologies and waves of IT change during my days at JBoss, Red Hat, SpringSource and VMware, but I’ve not seen anything quite like this Hadoop wave.We’re clearly at the forefront of a movement of something BIG, so savor the moment!Title: Hadoop Powers Modern Enterprise Data ArchitecturesBig data is everywhere and in many formats. We see it on commercials. We hear it in conversations over coffee. It is an expanding topic in the boardroom. At the center of the big data discussion is Apache Hadoop which has evolved from a tool for web-scale early adopters to an enterprise data platform that addresses the needs of mainstream businesses. In this talk Shaun Connolly, VP Corporate Strategy for Hortonworks, will discuss how Hadoop has given rise to a next-generation enterprise data architecture that is uniquely capable of storing, refining, and deriving new business insights from ALL types of data in a way that compliments existing enterprise systems and tools.Connolly will walk through how enterprises are utilizing Hadoop to refine and explore multi-structured information and enrich their applications with new insights. He will look at real-world use cases where Hadoop has helped produce more business value, augment productivity or identify new and potentially lucrative opportunities. Over the coming years, Hadoop could be in a position to process more than half the world's data. While there is much work to be done to achieve this lofty goal, Connolly will highlight how the community and broader solution ecosystem have made great strides towards solidifying Hadoop's place within the enterprise.
  2. Gartner talks about how the IT landscape is being changed by the Nexus of Forces: namely Mobile, Social, Cloud, and Information (aka Big Data). Hadoop is clearly an Information Management technology, but if you think about it, Hadoop has its massive legs in Mobil, Social, and Cloud. It’s certainly a unique technology!To frame up my talk, I chose this quote from Mark Beyer of Gartner:“By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent.”Whether it’s opening up new business opportunities or outperforming your competitors by 20% or more, the important point to be made is that big data technologies offer very real and compelling BUSINESS and FINANCIAL value to go along with the innovative TECHNOLOGY that is able to do things never before possible.What I ALSO like about this quote is that it’s NOT a new quote. It was made about 1.5 years ago in late 2011!
  3. Let’s set some context before digging into the Modern Data Architecture.While overly simplistic, this graphic represents the traditional data architecture:- A set of data sources producing data- A set of data systems to capture and store that data: most typically a mix of RDBMS and data warehouses- A set of custom and packaged applications as well as business analytics that leverage the data stored in those data systems. Your environment is undoubtedly more complicated, but conceptually it is likely similar. This architecture is tuned to handle TRANSACTIONS and data that fits into a relational database.[CLICK] Fast-forward to recent years and this traditional architecture has become PRESSURED with New Sources of data that aren’t handled well by existing data systems. So in the world of Big Data, we’ve got classic TRANSACTIONS and New Sources of data that come from what I refer to as INTERACTIONS and OBSERVATIONS.INTERACTIONS come from such things as Web Logs, User Click Streams, Social Interactions & Feeds, and User-Generated Content including video, audio, and images.OBSERVATIONS tend to come from the “Internet of Things”. Sensors for heat, motion, and pressure and RFID and GPS chips within such things as mobile devices, ATM machines, automobiles, and even farm tractors are just some of the “things” that output Observation data.
  4. So let’s consider those NEW SOURCES of data and get a sense of the scope involved by considering some stats from IDC.[CLICK] According to IDC, 2.8ZB of data were created and replicated in 2012.A Zettabyte for those unfamiliar with the term is 1 BILLION Terabytes.[CLICK] 85% of that is from New Sources of Data.[CLICK] Out of that 85%, machine-generated data is a key driver in the growth and just that one new source of data is expected to grow by 15X by 2020.[CLICK] Fast-forward to 2020 and we’ll have 40 Zettabytes of data in the digital universe! This represents 50-fold growth from the beginning of 2010.[CLICK] Needless to say, wrestling that scale of data is like this poor guy trying to wrestle a champion Sumo athlete. Overwhelmed and outmatched to say the least. I’ve been using this graphic for the past 10 years or so. Given the world of big data we live in, I just had to trot this picture out once more. It just says it all, doesn’t it?
  5. As the volume of data has exploded, we’ve seen organizations acknowledge that not all data belongs in a traditional data system. The drivers are both cost and technology. As volumes grow, database licensing costs as well as the corresponding hardware costs can become prohibitive. And traditional databases are not ideal for handling very large datasets of varying data types. People want to store data quickly in its RAW format and apply structure and a schema later…after its been processed a bit more.Enter Enterprise Hadoop as a peer to traditional data systems. The momentum for Hadoop is NOT about replacing traditional databases. Rather it’s about adding it in to handle this big data problem and doing so in a way that integrates easily with existing data systems, tools and approaches.This means it must interoperate with:- Existing applications and BI tools- Existing databases and data warehouses for loading data to / from the data warehouse- Development tools used for building custom applications- Operational tools for managing and monitoringMainstream enterprises want to get the benefits of new technologies in ways that leverage existing skills and integrate with existing systems.
  6. In order to illustrate how Hadoop fits within the broader enterprise data architecture, I prefer to use a data flow diagram rather than the classic stack diagram we just covered.We are seeing may customers that want to deploy what we’ve been referring to as a “Data Lake” Solution Architecture that puts them in a position to maximize the value from ALL of their data: transactions + interactions + observations.At the highest level, we have three major areas of data processing, the first two of which are familiar to most enterprises:1. Business Transactions & Interactions2. Business Intelligence & AnalyticsEnterprise IT has been connecting systems via classic Data Integration and ETL processing, as illustrated in Step 1 above, for many years in order to deliver STRUCTURED and REPEATABLE analysis. In this step, the business determines the questions to ask and IT collects and structures the data needed to answer those questions.[CLICK] As we’ve discussed, New Data Sources representing Interactions and Observations have come onto the scene. And Enterprise Hadoop has appeared as a new system capable of capturing ALL of this multi-structured data into one place. Hadoop acts as a “Data Lake” if you will. Some call it a Data Reservoir, a Catch Basin, a Data Refinery, the foundation for a Data Hub & Spoke architecture. Regardless of name, it’s a place where ALL data can be brought together where it can then be flexibly aggregated and transformed into useful formats that help fuel new insights for the business. Structure and schema is applied when needed, NOT as a prerequisite before landing the data. [CLICK] The next step is about getting the data in the right format to those who need it. Some folks will cordon off ponds of data, to keep with our metaphor, for data scientists, researchers, or particular departments to interact with specific data of interest. Tools like Hive and HBase are commonly used for interacting with Hadoop data directly.Mainstream enterprises also benefit from integrating Enterprise Hadoop with their systems powering Business Transactions & Interactions and Business Intelligence & Analytics in order to open up the ability for them to get a richer and more informed 360 ̊ view of customers, for example. By directly integrating Enterprise Hadoop with Business Intelligence & Analytics solutions, companies can enhance their ability to more accurately understand the customer behaviors (aka Interactions) that lead to or inhibit their Transactions.Moreover, systems focused on Business Transactions & Interactions can benefit. Complex analytic models and calculations of key parameters can be performed in Hadoop and flow downstream to fuel online data systems powering business applications with the goal of more accurately targeting customers with the best and most relevant offers, for example.[CLICK] Since Hadoop is great at cost-effectively retaining large volumes of data for long periods of time, feedback loops enable a valuable closed-loop analytics system. Retaining the past 10 years of historical “Black Friday” retail data, for example, can benefit the business, especially if it’s blended with other data sources such as 10 years of weather data accessed from a third party data provider. The point here is that the opportunities for creating value from multi-structured data sources available inside and outside the enterprise are virtually endless if you have a platform that can do it cost effectively and at scale.A couple of final points before I move on:1. Capturing all data in Hadoop does not mean that your existing transaction and analytics applications need to be forklifted to run on top of Hadoop. The point here is that you can ALSO store data in Hadoop that’s in those systems. Yes, the data gets stored twice, but the flexibility and agility in doing so far exceeds the incremental expense….especially given the commodity nature of hardware that Hadoop uses.2. And one final point on the Data Lake. The goal isn’t to fill up Lake Superior right away. Most companies start with a small lake of data needed for targeted applications and over time, direct more and more streams of data into the lake. Let success beget more success.
  7. So as mainstream enterprises begin to store ALL of their data in one place, there’s a clear and growing desire to not only work with that data using classic, batch-oriented MapReduce, but a much wider range of interaction patterns.[CLICK] Interactive SQL solutions running on or next to Hadoop have gotten lots of press over recent months. Online data systems that store their data in HDFS are on the rise. As is Streaming and Complex Event Processing solutions, and Graph Processing. In-Memory Data Processing is another area. Even classic HPC Message Passing Interface apps are storing data in HDFS.The point here is that as enterprises store all data in one place, they increasingly need to interact with that data in a wide variety of ways.
  8. We are facing an exciting generational change in the Hadoop space.The first wave of Hadoop was about HDFS and MapReduce where MapReduce had a split brain, so to speak. It was a framework for massive distributed data processing, but it also had all of the Job Management and Task Tracking capabilities built into it.The second wave of Hadoop is upon us and a component called YARN has emerged that generalizes all of that Cluster Resource Management in a way where MapReduce is NOW just one of many frameworks or applications that can run atop YARN. Simply put, YARN is the distributed operating system for data processing applications. For those curious, YARN stands for “Yet Another Resource Negotiator”. [CLICK] YARN enables applications to run natively IN Hadoop versus ON HDFS or next to Hadoop.  [CLICK] Why is that important? Because businesses want the ability to run more applications on their Hadoop data, and do so with predictable performance and quality of service. Mixed workload management enables customers to protect against one application or user hogging cluster resources and starving the other applications running in the Hadoop cluster.  [CLICK] Businesses do NOT want to stovepipe clusters based on batch processing versus interactive SQL versus online data serving versus real-time streaming use cases. They're adopting a big data strategy so they can get ALL of their data in one place and access that data in a wide variety of ways. This second wave of Hadoop represents a major rearchitecture that has been underway for 3 or 4 years. And this slide shows just a sampling of other open source projects that are or will be leveraging YARN in the not so distant future. Apache Tez is a new framework that I’ll cover in a bit. Folks at Yahoo have shared open source code that enables Twitter Storm to run on YARN. Apache Giraph is a graph processing system that is YARN enabled. Spark is an in-memory data processing system built at Berkeley that’s been recently contributed as an Apache Software Foundation project. OpenMPI is an open source Message Passing Interface system for HPC that works on YARN. These are just a few examples.
  9. As I just mentioned, the topic of SQL for Hadoop has been a hot topic for the past 6 months or so. And rightly so. There are easily millions of people with SQL skills thatwould like to leverage those skills as they look to gain insight and value from data stored in Hadoop. With that as backdrop, at the beginning of the year, the Stinger Initiative was rolled out. It’s focus was to rally the Apache Hive community around the goals of making Hive 100X faster, so it can handle those interactive querying use cases, and making Hive more SQL compliant so its BI use cases are richer. Oh, and by the way, this work needs to happen in a way that PRESERVES Hive’s awesome capability of processing ginormous data sets. Well, Eric14 will cover the details of where the Stinger effort stands; it’s made awesome progress.What I wanted to highlight here was that as part of the Stinger Initiative effort, a new data processing framework has appeared to help handle the interactive querying use cases for Hive. This project is called Apache Tez and it helps eliminate needless HDFS writes that have traditionally slowed down Hive. Instead of a complex DAG of MapReduce steps, Tez helps create a Map-Reduce-Reduce paradigm that is much faster. The net-out of this is that Interactive SQL querying use cases can now run natively IN Hadoop since Tex is built on YARN. This helps ensure that Interactive Queries and classic MapReduce processing can coexist nicely within the same cluster with predictable performance and SLAs.
  10. So enterprise Hadoop lies at the heart of the next-generation data architecture.Let’s outline what’s required in and around Hadoop in order to make it easy to use and consume by the enterprise.At the center, we start with Apache Hadoop for distributed file storage and data processing (a la HDFS, MapReduce, and YARN).[CLICK] In order to enable Hadoop within mainstream enterprises, we need to address enterprise concerns such as high availability, disaster recovery, snapshots, security, etc. And the community has been hard at work in both the 1.0 and 2.0 lines of Hadoop addressing these needs. There are also new incubator projects such as Apache Knox, that Eric will cover later, for improving user access to Hadoop clusters.[CLICK] And on top of this, we need to provide data services that make it easy to move data in and out of the platform, process and transform the data into useful formats, and enable people and other systems to access the data easily. This is where components like Apache Hive for SQL access, HCatalog for describing and managing your tables within Hadoop, Pig for script-based data processing, HBase for online data serving, Sqoop and Flume for getting data into Hadoop, etc.[CLICK] It’s also important…I would argue equally important…to make the platform easy to operate. Components like Apache Ambari for provisioning, management and monitoring of the cluster, Oozie for job & workflow scheduling and a new framework called Apache Falcon for Data Lifecycle Management fit here.[CLICK] So all of that: Core and Platform Services, Data Services, and Operational Services all come together into what I think of as “Enterprise Hadoop”.[CLICK] Ensuring that Enterprise Hadoop can be flexibly deployed across operating systems and virtual environments like Linux, Windows, and VMware is important. Targeting Cloud environments like Amazon Web Services, Microsoft Azure, Rackspace OpenCloud, and OpenStack is increasingly important. As is the ability to provide enterprise Hadoop pre-configured within a Hardware appliance like Teradata’s Big Analytics Appliance helps enterprises deploy Hadoop quickly, easily and in a familiar way.
  11. With that as backdrop, I’d like to talk about the need for better Data Lifecycle Management capabilities in Hadoop clusters. And to do so, I’d like to welcome MohitSaxena, the VP and Technology Founder of InMobi to the stage. For those unfamiliar with InMobi, they are a company focused on mobile advertising and have been recently voted one of 50 disruptive companies by MIT Technology Review. InMobi has been using Hadoop for many years and their technologists have been very active code contributors in the Apache Hadoop community. I’ve asked Mohit to join us today to share a little bit about how and why InMobi uses Hadoop and share some thoughts on how his team handles the challenge of managing data at scale and across datacenters.[SHAUN shakes Mohit’s hand and CLICKS to next slide]
  12. [SHAUN] Mohit, we’ve got a high level diagram of your data processing architecture. Why don’t you set some context for InMobi by sharing some of the impressive business metrics and Hadoop cluster metrics behind this picture:[MOHIT]~1.5 Trillion ads requested per year20 Billion messages streamed per year 2 Billion monetization events6 Clusters ranging from 40 to 250 nodes each20 Million Hadoop jobs submitted by users2 Billion MapReduce slots used in Hadoop[SHAUN]Pretty impressive solution architecture! One of the common questions I get from enterprise customers is how to deal with Data Lifecycle Management in Hadoop environments. You and your team addressed those needs by creating a framework that you ultimately contributed to the Apache Software Foundation as Apache Falcon.[TRANSITION TO NEXT SLIDE]
  13. [SHAUN]Please share the story behind Falcon for the audience.[MOHIT]Discuss what problems you were looking to address with the technology that ultimately became Falcon: specifically how to handle such things as orchestrating data ingest and data processing pipelines, disaster recovery and data retention scenarios, etc.Also share why you decided to contribute the project to Apache. [SHAUN] Everybody, please join me in thanking Mohit for joining us today and sharing his story. It’s amazing to see how companies like InMobi can help accelerate the process of making Hadoop a more enterprise viable data platform.
  14. I’ve been in enterprise open source for almost a decade. One thing I’ve learned along the way is that it’s best to think of “Community” in a broad way. In the Hadoop space, there is clearly the open source community. Without the innovative Apache open source technology, none of us would be here today.For really impactful and industry-changing open source technologies, there’s also the end user community. This community spans the tech-savvy early adopter types as well as the more pragmatic and conservative adopter types who want a more “whole solution”. Then the 3rd piece is the broader ecosystem that integrates with, extends, enhances, builds on, etc.One of the reasons I asked Mohit from InMobi to come on stage and share his story is that InMobi is a great example of an End User who is VERY ACTIVE in the open source Community.This room is filled with people across these 3 areas and each of these perspectives is CRITICALLY IMPORTANT if Hadoop is to be all it can be. So my simple ask of you is:GET INVOLVED…in whatever way makes sense for you and your business.
  15. The ecosystem plays a critical role in rounding out solution architectures around Apache Hadoop. This slide outlines 3 major layers of the data stack and conveniently lists the Hadoop Summit platinum sponsors. Starting from the bottom, we have Infrastructure and Systems Management. Above that we have Data Management Systems, Data Movement, and Integration solutions. Then at the top, we have Development Tools, Business Tools, and Applications that ride on top. I’d like to thank:Cisco, Microsoft, Kognitio, IBM, Teradata, Datameer, Karmasphere, Platfora, SAS, and Splunk for being platinum sponsors!I also want to thank Yahoo for co-hosting this event with Hortonworks!
  16. Now let’s expand the scope to include ALL of the sponsors!I love this slide because it is very BUSY!The cool thing is that we have almost 70 sponsors that provide really nice coverage across all layers of the data stack. This is a great example that the Hadoop market is maturing quite nicely!
  17. So I’d like to end my session with a quick summary of where the Hadoop market stands today.Hadoop Wave ONE started in 2006 and did a GREAT job at Web-scale Batch-oriented data processing. A vibrant community and strong enterprise interest propelled Hadoop across the Chasm at the end of 2012.
  18. The 2nd wave of Hadoop has started and it will continue to fuel Hadoop on its path through mainstream adoption. Everyone in this room is at the forefront of a movement that will have lasting impact across the industry. As Rob mentioned in his opening remarks, Hadoop has the opportunity to process half the world’s data. There’s still a lot of work to be done.My simple ask of you is: GET INVOLVED…in whatever way makes sense for you and your business.Thank you and have a great conference!