SlideShare a Scribd company logo
1 of 10
Apache Hadoop
Now, Next, and Beyond

Shaun Connolly
VP Corporate Strategy, Hortonworks

April 19, 2012




© Hortonworks Inc. 2012
Big Data: Transactions + Interactions + Observations
                                                                          BIG DATA
                      User Generated Content                                                     Sensors / RFID / Devices

Petabytes                                        Mobile Web                         Social Interactions & Feeds
                                                                              Sentiment
                        User Click Stream                                                        Spatial & GPS Coordinates


                             Web logs                Web             A/B testing               External Demographics

 Terabytes
                      Offer history                         Dynamic Pricing                      Business Data Feeds

                                                                       Affiliate Networks
                                                                                                   HD Video, Audio, Images
                                       CRM     Segmentation
 Gigabytes                                                              Search Marketing
                                                  Offer details                                          Speech to Text
                         ERP                 Customer Touches         Behavioral Targeting
                                                                                              Product/Service Logs
                  Purchase detail            Support Contacts
 Megabytes        Purchase record                                      Dynamic Funnels                     SMS/MMS
                  Payment record




                                        Increasing data variety and complexity

                                                                                                                     Page 2
             © Hortonworks Inc. 2012
What is Apache Hadoop?


• Collection of Open Source Projects           One of the best examples of
   – Apache Software Foundation (ASF)         open source driving innovation
   – Loosely coupled, ship early/often           and creating a market




                                         • Solution for big data
                                            – Stores petabytes of data reliably
                                            – Runs highly distributed applications
                                            – Enables a rational economics model
                                            – Powers data-driven business



                                                                           Page 3
        © Hortonworks Inc. 2012
Key Hadoop Stack Components
                                                                      Core Components                                 Extended Components



                                                                              Pig                          Hive                Ambari &
                                             (Columnar NoSQL Store)




                                                                           (Data Flow)            (SQL-like Access)     Other Monitoring & Management
                                     HBase
            (Cluster Coordination)




                                                                                    MapReduce                                    Oozie &
Zookeeper




                                                                            (Distributed Programing Framework)            Other Workflow Scheduling




                                                                                         HCatalog                               Sqoop &
                                                                               (Table & Schema Management)                  Other Ingest, ETL tools



                                                                                   HDFS                                        Mahout &
                                                                        (Hadoop Distributed File System)                        Other Libraries




                                                                                                                                                        Page 4
                                                       © Hortonworks Inc. 2012
Hadoop Now, Next, and Beyond
  Apache community, including Hortonworks investing to improve Hadoop:
  • Make Hadoop an open, extensible, and enterprise viable platform
  • Enable more applications to run on Apache Hadoop
                                                             “Hadoop.Beyond”
                                                            Integrate w/ecosystem
                                      “Hadoop.Next”
                                        (Hadoop 0.23)
                                           HDP 2

  “Hadoop.Now”                       Next-gen HDFS & MapReduce
     (Hadoop 1.0)
        HDP 1
Most stable Hadoop ever




                                                                               Page 5
           © Hortonworks Inc. 2012
Unifying Classic & Big Data Methods

                                            Classic Method
                                        Structured & Repeatable Analysis




Business determines what                                                      IT structures the data to
    questions to ask                                                          answer those questions
                                      SQL Performance and Structure
                                                                               “Capture only
                                                                               what’s needed”
“Capture in case it’s
     needed”                         MapReduce Processing Flexibility




 IT delivers a platform for              Big Data Method
   storing, refining, and                                                    Business explores data for
                                     Multi-structured & Iterative Analysis   questions worth answering
analyzing all data sources



                                                                                                    Page 6
           © Hortonworks Inc. 2012
Unified Big Data Architecture
Enable Developers, Data Scientists, & Information Workers




      Java, C/C++, Pig, JavaScript, Python, R, SAS, SQL, Excel, BI Tools, Reporting, etc.




                        Capture, Store, Refine, Discover, Analyze, Report, Retain

  •   Fast data loading                      •   Path & pattern analysis       •   Operational analysis
  •   ELT/ETL and refinement                 •   Graph analysis                •   Transactional analysis
  •   Image/video analysis                   •   Text analysis                 •   High volume ad-hoc
  •   Online retention                       •   Iterative discovery           •   Elastic data marts

                          Batch                         Interactive                  Active

    Audio,
                 Docs &            Machine   Coords &       Social    Web &
   Video &                                                                         CRM        SCM    ERP
                  Text              Logs     Sensors       Content    Mobile
   Images


                                                                                                            Page 7
         © Hortonworks Inc. 2012
Hortonworks Vision


   We believe that by the end of 2015,
   more than half the world's data will
   be processed by Apache Hadoop.


                       Q: How to achieve that vision???
                       A: Ecosystem enablement around enterprise-
                             viable open source data platform

                                                             Page 8
     © Hortonworks Inc. 2012
•   2-day event (June 13-14, 2012) in San Jose, CA
•   84 breakout sessions
•   Showcasing real-world examples, developments and
    best practices of Apache Hadoop
•   Plus, Geoffrey Moore to keynote and more to be
    announced
•   Register now at: http://www.hadoopsummit.org

                                                     Page 9
June 13-14, 2012
San Jose, CA

More Related Content

What's hot

Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationDataWorks Summit
 
Impact of in-memory technology and SAP HANA (2012 Update)
Impact of in-memory technology and SAP HANA (2012 Update)Impact of in-memory technology and SAP HANA (2012 Update)
Impact of in-memory technology and SAP HANA (2012 Update)Vitaliy Rudnytskiy
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetupRoby Chen
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Big Data launch Singapore Patrick Buddenbaum
Big Data launch Singapore Patrick BuddenbaumBig Data launch Singapore Patrick Buddenbaum
Big Data launch Singapore Patrick BuddenbaumIntelAPAC
 
Exploring Data with Jaspersoft
Exploring Data with JaspersoftExploring Data with Jaspersoft
Exploring Data with JaspersoftMike Boyarski
 
Big Data launch keynote Singapore Patrick Buddenbaum
Big Data launch keynote Singapore Patrick BuddenbaumBig Data launch keynote Singapore Patrick Buddenbaum
Big Data launch keynote Singapore Patrick BuddenbaumIntelAPAC
 
Analytics on Hadoop
Analytics on HadoopAnalytics on Hadoop
Analytics on HadoopEMC
 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Cloudera, Inc.
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Cana Ko
 
Introducing Jaspersoft 5
Introducing Jaspersoft 5Introducing Jaspersoft 5
Introducing Jaspersoft 5Mike Boyarski
 
Embedded Analytics in your App Webinar
Embedded Analytics in your App WebinarEmbedded Analytics in your App Webinar
Embedded Analytics in your App WebinarMike Boyarski
 
Evaluating jaspersoft community & commercial editions
Evaluating jaspersoft community & commercial editionsEvaluating jaspersoft community & commercial editions
Evaluating jaspersoft community & commercial editionsMike Boyarski
 
Jaspersoft Dashboards Webinar Feb 2013
Jaspersoft Dashboards Webinar  Feb 2013Jaspersoft Dashboards Webinar  Feb 2013
Jaspersoft Dashboards Webinar Feb 2013Mike Boyarski
 
A unified data modeler in the world of big data
A unified data modeler in the world of big dataA unified data modeler in the world of big data
A unified data modeler in the world of big dataWilliam Luk
 
Sap sap so h 2013
Sap sap so h 2013Sap sap so h 2013
Sap sap so h 2013deepersnet
 
Microsoft SQL Azure - Cloud Based Database Datasheet
Microsoft SQL Azure - Cloud Based Database DatasheetMicrosoft SQL Azure - Cloud Based Database Datasheet
Microsoft SQL Azure - Cloud Based Database DatasheetMicrosoft Private Cloud
 
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage SystemHugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage Systemqlw5
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarCloudera, Inc.
 

What's hot (20)

Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
 
Impact of in-memory technology and SAP HANA (2012 Update)
Impact of in-memory technology and SAP HANA (2012 Update)Impact of in-memory technology and SAP HANA (2012 Update)
Impact of in-memory technology and SAP HANA (2012 Update)
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Big Data launch Singapore Patrick Buddenbaum
Big Data launch Singapore Patrick BuddenbaumBig Data launch Singapore Patrick Buddenbaum
Big Data launch Singapore Patrick Buddenbaum
 
Exploring Data with Jaspersoft
Exploring Data with JaspersoftExploring Data with Jaspersoft
Exploring Data with Jaspersoft
 
Big Data launch keynote Singapore Patrick Buddenbaum
Big Data launch keynote Singapore Patrick BuddenbaumBig Data launch keynote Singapore Patrick Buddenbaum
Big Data launch keynote Singapore Patrick Buddenbaum
 
Analytics on Hadoop
Analytics on HadoopAnalytics on Hadoop
Analytics on Hadoop
 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
 
Introducing Jaspersoft 5
Introducing Jaspersoft 5Introducing Jaspersoft 5
Introducing Jaspersoft 5
 
Embedded Analytics in your App Webinar
Embedded Analytics in your App WebinarEmbedded Analytics in your App Webinar
Embedded Analytics in your App Webinar
 
Evaluating jaspersoft community & commercial editions
Evaluating jaspersoft community & commercial editionsEvaluating jaspersoft community & commercial editions
Evaluating jaspersoft community & commercial editions
 
Jaspersoft Dashboards Webinar Feb 2013
Jaspersoft Dashboards Webinar  Feb 2013Jaspersoft Dashboards Webinar  Feb 2013
Jaspersoft Dashboards Webinar Feb 2013
 
A unified data modeler in the world of big data
A unified data modeler in the world of big dataA unified data modeler in the world of big data
A unified data modeler in the world of big data
 
Sap sap so h 2013
Sap sap so h 2013Sap sap so h 2013
Sap sap so h 2013
 
Microsoft SQL Azure - Cloud Based Database Datasheet
Microsoft SQL Azure - Cloud Based Database DatasheetMicrosoft SQL Azure - Cloud Based Database Datasheet
Microsoft SQL Azure - Cloud Based Database Datasheet
 
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage SystemHugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage System
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 

Similar to Hadoop - Now, Next and Beyond

Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisOW2
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondDataWorks Summit
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsHortonworks
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The EnterpriseCloudera, Inc.
 
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...Big Data Spain
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Hortonworks
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesCloudera, Inc.
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsDataWorks Summit
 
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012m_hepburn
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceSteve Loughran
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 

Similar to Hadoop - Now, Next and Beyond (20)

Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
 
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
 
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG France
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 

More from Teradata Aster

Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingRazorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingTeradata Aster
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-MakingTeradata Aster
 
Using Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic EnvironmentUsing Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic EnvironmentTeradata Aster
 
Big Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey DataBig Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey DataTeradata Aster
 
What Makes A Great Data Scientist?
What Makes A Great Data Scientist?What Makes A Great Data Scientist?
What Makes A Great Data Scientist?Teradata Aster
 
Practical Applications of Visual Analytics
Practical Applications of Visual AnalyticsPractical Applications of Visual Analytics
Practical Applications of Visual AnalyticsTeradata Aster
 
Trust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social MediaTrust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social MediaTeradata Aster
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business AdvantageTeradata Aster
 
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s DilemmaBig Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s DilemmaTeradata Aster
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
 
Keynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball AnalyticsKeynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball AnalyticsTeradata Aster
 
Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics, Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics, Teradata Aster
 
From Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedFrom Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedTeradata Aster
 
Solving the Education Crisis with Big Data
Solving the Education Crisis with Big DataSolving the Education Crisis with Big Data
Solving the Education Crisis with Big DataTeradata Aster
 
Using SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced AnalyticsUsing SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced AnalyticsTeradata Aster
 
SAS aster data big data dc presentation public
SAS aster data big data dc presentation publicSAS aster data big data dc presentation public
SAS aster data big data dc presentation publicTeradata Aster
 
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Teradata Aster
 
20100506 aster data big data summit - microstrategy (shareable)
20100506   aster data big data summit - microstrategy (shareable)20100506   aster data big data summit - microstrategy (shareable)
20100506 aster data big data summit - microstrategy (shareable)Teradata Aster
 

More from Teradata Aster (20)

Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingRazorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-Making
 
Using Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic EnvironmentUsing Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic Environment
 
Big Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey DataBig Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey Data
 
What Makes A Great Data Scientist?
What Makes A Great Data Scientist?What Makes A Great Data Scientist?
What Makes A Great Data Scientist?
 
Practical Applications of Visual Analytics
Practical Applications of Visual AnalyticsPractical Applications of Visual Analytics
Practical Applications of Visual Analytics
 
Trust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social MediaTrust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social Media
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business Advantage
 
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s DilemmaBig Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 
Keynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball AnalyticsKeynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball Analytics
 
Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics, Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics,
 
From Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedFrom Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics Applied
 
Solving the Education Crisis with Big Data
Solving the Education Crisis with Big DataSolving the Education Crisis with Big Data
Solving the Education Crisis with Big Data
 
Using SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced AnalyticsUsing SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced Analytics
 
SAS aster data big data dc presentation public
SAS aster data big data dc presentation publicSAS aster data big data dc presentation public
SAS aster data big data dc presentation public
 
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
 
comScore
comScorecomScore
comScore
 
20100506 aster data big data summit - microstrategy (shareable)
20100506   aster data big data summit - microstrategy (shareable)20100506   aster data big data summit - microstrategy (shareable)
20100506 aster data big data summit - microstrategy (shareable)
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Hadoop - Now, Next and Beyond

  • 1. Apache Hadoop Now, Next, and Beyond Shaun Connolly VP Corporate Strategy, Hortonworks April 19, 2012 © Hortonworks Inc. 2012
  • 2. Big Data: Transactions + Interactions + Observations BIG DATA User Generated Content Sensors / RFID / Devices Petabytes Mobile Web Social Interactions & Feeds Sentiment User Click Stream Spatial & GPS Coordinates Web logs Web A/B testing External Demographics Terabytes Offer history Dynamic Pricing Business Data Feeds Affiliate Networks HD Video, Audio, Images CRM Segmentation Gigabytes Search Marketing Offer details Speech to Text ERP Customer Touches Behavioral Targeting Product/Service Logs Purchase detail Support Contacts Megabytes Purchase record Dynamic Funnels SMS/MMS Payment record Increasing data variety and complexity Page 2 © Hortonworks Inc. 2012
  • 3. What is Apache Hadoop? • Collection of Open Source Projects One of the best examples of – Apache Software Foundation (ASF) open source driving innovation – Loosely coupled, ship early/often and creating a market • Solution for big data – Stores petabytes of data reliably – Runs highly distributed applications – Enables a rational economics model – Powers data-driven business Page 3 © Hortonworks Inc. 2012
  • 4. Key Hadoop Stack Components Core Components Extended Components Pig Hive Ambari & (Columnar NoSQL Store) (Data Flow) (SQL-like Access) Other Monitoring & Management HBase (Cluster Coordination) MapReduce Oozie & Zookeeper (Distributed Programing Framework) Other Workflow Scheduling HCatalog Sqoop & (Table & Schema Management) Other Ingest, ETL tools HDFS Mahout & (Hadoop Distributed File System) Other Libraries Page 4 © Hortonworks Inc. 2012
  • 5. Hadoop Now, Next, and Beyond Apache community, including Hortonworks investing to improve Hadoop: • Make Hadoop an open, extensible, and enterprise viable platform • Enable more applications to run on Apache Hadoop “Hadoop.Beyond” Integrate w/ecosystem “Hadoop.Next” (Hadoop 0.23) HDP 2 “Hadoop.Now” Next-gen HDFS & MapReduce (Hadoop 1.0) HDP 1 Most stable Hadoop ever Page 5 © Hortonworks Inc. 2012
  • 6. Unifying Classic & Big Data Methods Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions SQL Performance and Structure “Capture only what’s needed” “Capture in case it’s needed” MapReduce Processing Flexibility IT delivers a platform for Big Data Method storing, refining, and Business explores data for Multi-structured & Iterative Analysis questions worth answering analyzing all data sources Page 6 © Hortonworks Inc. 2012
  • 7. Unified Big Data Architecture Enable Developers, Data Scientists, & Information Workers Java, C/C++, Pig, JavaScript, Python, R, SAS, SQL, Excel, BI Tools, Reporting, etc. Capture, Store, Refine, Discover, Analyze, Report, Retain • Fast data loading • Path & pattern analysis • Operational analysis • ELT/ETL and refinement • Graph analysis • Transactional analysis • Image/video analysis • Text analysis • High volume ad-hoc • Online retention • Iterative discovery • Elastic data marts Batch Interactive Active Audio, Docs & Machine Coords & Social Web & Video & CRM SCM ERP Text Logs Sensors Content Mobile Images Page 7 © Hortonworks Inc. 2012
  • 8. Hortonworks Vision We believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop. Q: How to achieve that vision??? A: Ecosystem enablement around enterprise- viable open source data platform Page 8 © Hortonworks Inc. 2012
  • 9. 2-day event (June 13-14, 2012) in San Jose, CA • 84 breakout sessions • Showcasing real-world examples, developments and best practices of Apache Hadoop • Plus, Geoffrey Moore to keynote and more to be announced • Register now at: http://www.hadoopsummit.org Page 9