SlideShare une entreprise Scribd logo
1  sur  20
Hadoop’s Role in the
Enterprise Architecture
Shaun Connolly
Hortonworks VP Strategy
@shaunconnolly
What is Big Data?


     What is Big Data?
Transactions

 Interactions

Observations
What is Big Data?
                                                          Transactions + Interactions
Petabytes
              BIG DATA                 Mobile Web                 + Observations
                                       Sentiment         SMS/MMS

                                        User Click Stream
                                                                        = BIG DATA
                                                               Speech to Text

                                                     Social Interactions & Feeds
 Terabytes    WEB        Web logs
                                                               Spatial & GPS Coordinates
                                A/B testing
                                                                     Sensors / RFID / Devices
                                        Behavioral Targeting
  Gigabytes   CRM                                                            Business Data Feeds
                                                   Dynamic Pricing
                         Segmentation                                              External Demographics
                                                         Search Marketing
                                Customer Touches                                    User Generated Content
              ERP
  Megabytes                                                 Affiliate Networks
              Purchase detail        Support Contacts                                  HD Video, Audio, Images
                                                               Dynamic Funnels
              Purchase record
                                         Offer details           Offer history           Product/Service Logs
              Payment record



                                        Increasing Data Variety and Complexity
Big Data Market Drivers
    Business
1   Enable new business models & drive faster growth (20%+)

2   Find insights for competitive advantage & optimal returns

    Technical
3   Data continues to grow exponentially

4   Data is increasingly everywhere and in many formats

5   Traditional solutions not designed for new requirements

    Financial
6   Cost of data systems, as % of IT spend, continues to grow

7   Cost advantages of commodity hardware & open source
Is This Your Big Data Strategy?


BIG DATA




                                   you
Next-Generation Data Architecture

     Unstructured                                     Business           CRM, ERP
        Data
                                                    Transactions         Web, Mobile
                                                    & Interactions       Point of sale
       Log files        Enterprise
                         Hadoop
     Exhaust Data
                         Platform
                                                              Classic Data
                                                              Integration & ETL
     Social Media


       Sensors,
       devices                                        Business           Dashboards,
                                                     Intelligence        Reports,
                                                     & Analytics         Visualization, …
       DB data




1 Capture Big Data 2 Process & Structure 3 Distribute Results 4 Feedback & Retain
Making Hadoop Enterprise Ready

        OPERATIONAL                DATA
          SERVICES               SERVICES
          Manage &                 Store,
          Operate at            Process and
            Scale               Access Data



                        Distributed
        HADOOP CORE     Storage & Processing


                            Enterprise Readiness: HA,
        PLATFORM SERVICES   DR, Snapshots, Security, …



            ENTERPRISE HADOOP PLATFORM

         OS / VM        Cloud             Appliance
Existing Data Architecture
APPLICATIONS




                   Business                      Custom        Enterprise
                   Analytics                   Applications   Applications
                                                                             DEV & DATA
                                                                               TOOLS

                                                                               BUILD &
                                                                                TEST
DATA SYSTEMS




                                                                             OPERATIONAL
                                                                                TOOLS

                                                                             MANAGE &
                RDBMS      EDW            MP                                 MONITOR
                      TRADITIONAL REPOS    P
DATA SOURCES




                 Traditional Sources
                OLTP,(RDBMS,   OLTP, OLAP)
                 POS
               SYSTEMS
An Emerging Data Architecture
APPLICATIONS




                   Business                      Custom                Enterprise
                   Analytics                   Applications           Applications
                                                                                                DEV & DATA
                                                                                                  TOOLS

                                                                                                  BUILD &
                                                                                                   TEST
DATA SYSTEMS




                                                                                                OPERATIONAL
                                                                                                   TOOLS
                                                                      ENTERPRISE
                                                                                                MANAGE &
                                                                      HADOOP PLATFORM           MONITOR
                RDBMS      EDW            MP
                      TRADITIONAL REPOS    P
DATA SOURCES




                 Traditional Sources                           New Sources
                OLTP,(RDBMS,   OLTP, OLAP)          (web logs, email, sensors, social media)
                                                                                       MOBILE
                 POS                                                                    DATA
               SYSTEMS
[Integrating Hadoop with
existing IT investments is
vitally important.]
                   Larry Feinsmith
Interoperating With Your Tools
APPLICATIONS




                      Microsoft Applications
                                                                                           DEV & DATA
                                                                                             TOOLS
DATA SYSTEMS




                                                                                           OPERATIONAL
                                                                                              TOOLS
                                                                 ENTERPRISE
                                                                 HADOOP PLATFORM
                      TRADITIONAL REPOS                                                         Viewpoint
DATA SOURCES




                 Traditional Sources                      New Sources
                OLTP,(RDBMS,   OLTP, OLAP)     (web logs, email, sensors, social media)
                                                                                  MOBILE
                 POS                                                               DATA
               SYSTEMS
Big Data Tag Team!


Your                 Enterprise
Tools                 Hadoop
Hadoop Common Patterns of Use
             Business Cases


            “Right-time” Access to Data
          Batch          Interactive       Online



         Refine        Explore           Enrich

                     ENTERPRISE
                  HADOOP PLATFORM



                       Big Data
          Transactions, Interactions, Observations
Operational Data Refinery
                                                                                                                         Enric
                                                                                                      Refine   Explore
                                                                                                                          h
APPLICATIONS




               Business                     Custom                   Enterprise                   Transform & refine ALL
               Analytics                  Applications              Applications                  sources of data

                                                                                                  Also known as Data
                                                                                                  Reservoir or Catch Basin
                                                     3
DATA SYSTEMS




                                                                             ENTERPRISE
                                                                             HADOOP
                                                                                              2   1   Capture
                RDBMS      EDW           MPP                                 PLATFORM
                     TRADITIONAL REPOS




                                                                                                  2   Process
                                                                            1
DATA SOURCES




               Traditional Sources                          New Sources                           3   Distribute & Retain
                (RDBMS, OLTP, OLAP)            (web logs, email, sensor data, social media)
Big Data Exploration & Visualization
                                                                                                      Refine   Explore   Enrich
APPLICATIONS




               Business                     Custom                   Enterprise                   Leverage “data lake”
               Analytics                  Applications              Applications                  to perform iterative
                                                                                                  investigation for value
                                                                   3
DATA SYSTEMS




                                                                             ENTERPRISE
                                                                             HADOOP
                                                                                              2   1   Capture
                RDBMS      EDW           MPP                                 PLATFORM
                     TRADITIONAL REPOS




                                                                                                  2   Process
                                                                            1
DATA SOURCES




               Traditional Sources                          New Sources                           3   Explore & Visualize
                (RDBMS, OLTP, OLAP)            (web logs, email, sensor data, social media)
Application Enrichment
                                                                                                      Refine   Explore   Enrich
APPLICATIONS




                                            Custom                   Enterprise                   Create intelligent
                                          Applications              Applications                  applications

                                                            3
                                                                                                  Collect data, create
                                                                                                  analytical models and
                                                                                                  deliver to online apps
DATA SYSTEMS




                                                                             ENTERPRISE
                                                                             HADOOP
                                                                                              2   1   Capture
                RDBMS     EDW            MPP        NOSQL                    PLATFORM
                     TRADITIONAL REPOS




                                                                                                  2   Process & Compute
                                                                            1
DATA SOURCES




               Traditional Sources                          New Sources                           3   Deliver Model
                (RDBMS, OLTP, OLAP)            (web logs, email, sensor data, social media)
Big Data: Optimize Outcomes at Scale
            Media     o p ti m i z e                Content
       Intelligence   o p ti m i z e                Detection
          Finance     o p ti m i z e                Algorithms
       Advertising    o p ti m i z e                Performance
             Fraud    o p ti m i z e                Prevention
 Retail / Wholesale   o p ti m i z e                Inventory turns
    Manufacturing     o p ti m i z e                Supply chains
        Healthcare    o p ti m i z e                Patient outcomes
        Education     o p ti m i z e                Learning outcomes
      Government      o p ti m i z e                Citizen services
                                 Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.
Market Transitioning into Early Majority
 relative %
customers




                                         The CHASM
         Innovators,          Early                     Early
                                                                      Late majority,            Laggards,
         technology         adopters,                  majority,
                                                                      conservatives              Skeptics
         enthusiasts       visionaries               pragmatists




                                                                                                                     time
                  Customers want                                       Customers want
              technology & performance                             solutions & convenience

                                                                                        Source: Geoffrey Moore - Crossing the Chasm
At Hortonworks, we believe that by the end
 of 2015, more than half the world's data
   will be processed by Apache Hadoop.




 Welcome to Hadoop Summit and
     Enjoy the Conference!

Contenu connexe

En vedette

Software Quality Plan
Software Quality PlanSoftware Quality Plan
Software Quality Planguy_davis
 
AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스Amazon Web Services Korea
 
Fast+plants+essay
Fast+plants+essayFast+plants+essay
Fast+plants+essayjespinal5
 
Hematology learning guide
Hematology learning guide Hematology learning guide
Hematology learning guide Fidaa Jaafrah
 
Furan Testing of Transformers Oil
Furan Testing of Transformers OilFuran Testing of Transformers Oil
Furan Testing of Transformers OilNitish Kumar
 
2015 Largest Healthcare Staffing Firms in the US
2015 Largest Healthcare Staffing Firms in the US2015 Largest Healthcare Staffing Firms in the US
2015 Largest Healthcare Staffing Firms in the USBrian Snyder
 
Cách làm Email marketing thành công!
Cách làm Email marketing thành công!Cách làm Email marketing thành công!
Cách làm Email marketing thành công!missbik
 
Cowboy tools and attire
Cowboy tools and attireCowboy tools and attire
Cowboy tools and attireChristianN2T
 
Sustainable Leadership
Sustainable LeadershipSustainable Leadership
Sustainable LeadershipLaura Pasquini
 
Effect of electrolytes on cardiac rhythm
Effect of electrolytes on cardiac rhythmEffect of electrolytes on cardiac rhythm
Effect of electrolytes on cardiac rhythmAhmad Thanin
 
Icons and Stencils for Hadoop
Icons and Stencils for HadoopIcons and Stencils for Hadoop
Icons and Stencils for HadoopHortonworks
 

En vedette (12)

Software Quality Plan
Software Quality PlanSoftware Quality Plan
Software Quality Plan
 
AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스AWS를 활용한 미디어 스트리밍 서비스
AWS를 활용한 미디어 스트리밍 서비스
 
Fast+plants+essay
Fast+plants+essayFast+plants+essay
Fast+plants+essay
 
Hematology learning guide
Hematology learning guide Hematology learning guide
Hematology learning guide
 
Furan Testing of Transformers Oil
Furan Testing of Transformers OilFuran Testing of Transformers Oil
Furan Testing of Transformers Oil
 
2015 Largest Healthcare Staffing Firms in the US
2015 Largest Healthcare Staffing Firms in the US2015 Largest Healthcare Staffing Firms in the US
2015 Largest Healthcare Staffing Firms in the US
 
Cách làm Email marketing thành công!
Cách làm Email marketing thành công!Cách làm Email marketing thành công!
Cách làm Email marketing thành công!
 
Cowboy tools and attire
Cowboy tools and attireCowboy tools and attire
Cowboy tools and attire
 
Selenium at Salesforce Scale
Selenium at Salesforce ScaleSelenium at Salesforce Scale
Selenium at Salesforce Scale
 
Sustainable Leadership
Sustainable LeadershipSustainable Leadership
Sustainable Leadership
 
Effect of electrolytes on cardiac rhythm
Effect of electrolytes on cardiac rhythmEffect of electrolytes on cardiac rhythm
Effect of electrolytes on cardiac rhythm
 
Icons and Stencils for Hadoop
Icons and Stencils for HadoopIcons and Stencils for Hadoop
Icons and Stencils for Hadoop
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Hadoop's Role in Enterprise Architecture

  • 1. Hadoop’s Role in the Enterprise Architecture Shaun Connolly Hortonworks VP Strategy @shaunconnolly
  • 2. What is Big Data? What is Big Data?
  • 4. What is Big Data? Transactions + Interactions Petabytes BIG DATA Mobile Web + Observations Sentiment SMS/MMS User Click Stream = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity
  • 5. Big Data Market Drivers Business 1 Enable new business models & drive faster growth (20%+) 2 Find insights for competitive advantage & optimal returns Technical 3 Data continues to grow exponentially 4 Data is increasingly everywhere and in many formats 5 Traditional solutions not designed for new requirements Financial 6 Cost of data systems, as % of IT spend, continues to grow 7 Cost advantages of commodity hardware & open source
  • 6. Is This Your Big Data Strategy? BIG DATA you
  • 7. Next-Generation Data Architecture Unstructured Business CRM, ERP Data Transactions Web, Mobile & Interactions Point of sale Log files Enterprise Hadoop Exhaust Data Platform Classic Data Integration & ETL Social Media Sensors, devices Business Dashboards, Intelligence Reports, & Analytics Visualization, … DB data 1 Capture Big Data 2 Process & Structure 3 Distribute Results 4 Feedback & Retain
  • 8. Making Hadoop Enterprise Ready OPERATIONAL DATA SERVICES SERVICES Manage & Store, Operate at Process and Scale Access Data Distributed HADOOP CORE Storage & Processing Enterprise Readiness: HA, PLATFORM SERVICES DR, Snapshots, Security, … ENTERPRISE HADOOP PLATFORM OS / VM Cloud Appliance
  • 9. Existing Data Architecture APPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS OPERATIONAL TOOLS MANAGE & RDBMS EDW MP MONITOR TRADITIONAL REPOS P DATA SOURCES Traditional Sources OLTP,(RDBMS, OLTP, OLAP) POS SYSTEMS
  • 10. An Emerging Data Architecture APPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS OPERATIONAL TOOLS ENTERPRISE MANAGE & HADOOP PLATFORM MONITOR RDBMS EDW MP TRADITIONAL REPOS P DATA SOURCES Traditional Sources New Sources OLTP,(RDBMS, OLTP, OLAP) (web logs, email, sensors, social media) MOBILE POS DATA SYSTEMS
  • 11. [Integrating Hadoop with existing IT investments is vitally important.] Larry Feinsmith
  • 12. Interoperating With Your Tools APPLICATIONS Microsoft Applications DEV & DATA TOOLS DATA SYSTEMS OPERATIONAL TOOLS ENTERPRISE HADOOP PLATFORM TRADITIONAL REPOS Viewpoint DATA SOURCES Traditional Sources New Sources OLTP,(RDBMS, OLTP, OLAP) (web logs, email, sensors, social media) MOBILE POS DATA SYSTEMS
  • 13. Big Data Tag Team! Your Enterprise Tools Hadoop
  • 14. Hadoop Common Patterns of Use Business Cases “Right-time” Access to Data Batch Interactive Online Refine Explore Enrich ENTERPRISE HADOOP PLATFORM Big Data Transactions, Interactions, Observations
  • 15. Operational Data Refinery Enric Refine Explore h APPLICATIONS Business Custom Enterprise Transform & refine ALL Analytics Applications Applications sources of data Also known as Data Reservoir or Catch Basin 3 DATA SYSTEMS ENTERPRISE HADOOP 2 1 Capture RDBMS EDW MPP PLATFORM TRADITIONAL REPOS 2 Process 1 DATA SOURCES Traditional Sources New Sources 3 Distribute & Retain (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)
  • 16. Big Data Exploration & Visualization Refine Explore Enrich APPLICATIONS Business Custom Enterprise Leverage “data lake” Analytics Applications Applications to perform iterative investigation for value 3 DATA SYSTEMS ENTERPRISE HADOOP 2 1 Capture RDBMS EDW MPP PLATFORM TRADITIONAL REPOS 2 Process 1 DATA SOURCES Traditional Sources New Sources 3 Explore & Visualize (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)
  • 17. Application Enrichment Refine Explore Enrich APPLICATIONS Custom Enterprise Create intelligent Applications Applications applications 3 Collect data, create analytical models and deliver to online apps DATA SYSTEMS ENTERPRISE HADOOP 2 1 Capture RDBMS EDW MPP NOSQL PLATFORM TRADITIONAL REPOS 2 Process & Compute 1 DATA SOURCES Traditional Sources New Sources 3 Deliver Model (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)
  • 18. Big Data: Optimize Outcomes at Scale Media o p ti m i z e Content Intelligence o p ti m i z e Detection Finance o p ti m i z e Algorithms Advertising o p ti m i z e Performance Fraud o p ti m i z e Prevention Retail / Wholesale o p ti m i z e Inventory turns Manufacturing o p ti m i z e Supply chains Healthcare o p ti m i z e Patient outcomes Education o p ti m i z e Learning outcomes Government o p ti m i z e Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.
  • 19. Market Transitioning into Early Majority relative % customers The CHASM Innovators, Early Early Late majority, Laggards, technology adopters, majority, conservatives Skeptics enthusiasts visionaries pragmatists time Customers want Customers want technology & performance solutions & convenience Source: Geoffrey Moore - Crossing the Chasm
  • 20. At Hortonworks, we believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop. Welcome to Hadoop Summit and Enjoy the Conference!

Notes de l'éditeur

  1. Title: Hadoop's Role in the Enterprise ArchitectureWith the rise of Apache Hadoop, a next-generation enterprise data architecture is emerging that connects the systems powering business transactions and business intelligence. Hadoop is uniquely capable of storing, aggregating, and refining multi-structured data sources into formats that fuel new business insights. Organizations that embrace solution architectures focused on maximizing the value from ALL data will put themselves in a position to drive more business, enhance productivity, or discover new and lucrative business opportunities. Over the coming years, Hadoop could be in a position to process more than half the world's data. There is still much work to be done, however, if Hadoop is to achieve this lofty goal. In this talk Shaun Connolly, VP Corporate Strategy for Hortonworks, will look at Hadoop's role in the enterprise architecture and how it compliments existing enterprise systems.
  2. Thank you all for attending Hadoop Summit! I’d like to spend the next 30 minutes focused on Hadoop’s opportunity to power next-generation data architectures. I’ve been involved in open source for many years, having worked at JBoss back in 2004, then at Red Hat through 2008. After that I joined SpringSource and ultimately VMware through 2011. So I’ve seen a lot of open source technologies and waves of excitement and passionate users. But I’ve not seen anything quite like this Big Bata and Hadoop phenomenon.
  3. So our backdrop is BIG DATA.GARTNER REPORT ON 12 October 2012: http://www.gartner.com/id=2195915Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016Big data has become a major driver of IT spending. The benefits to organizations for adding big data to their information management and analytics infrastructure will force a more rapid cycle of replacing existing solutions.IDC study:http://cdn.idc.com/research/Predictions12/Main/downloads/IDCTOP10Predictions2012.pdfIDC projects that the digital universe will reach 40 zettabytes (ZB) by 2020, resulting in a 50-fold growth from the beginning of 2010According to the study, 2.8ZB of data will have been created and replicated in 2012.Machine-generated data is a key driver in the growth of the world’s data – which is projected to increase 15x by 2020.So the topic of big data is increasingly important….but like any presentation these days about Big Data, we’ve got to start off with a definition, right?I kinda like to describe Big Data using a simple equation.As I see it, Big Data = Transactions + Interactions + ObservationsMeaning, it not only spans your current highly structured transactional data sources, it includes new forms of data that represent interactions (i.e. website interactions, social interactions, etc.) and observations (i.e data coming off of sensors, devices, etc.)So, for all the burgeoning data scientists in the audience…there’s your equation!
  4. For the visual thinkers out there, let’s expand our mathematical model to show some concrete examples.ERP, SCM, CRM, and transactional Web applications are classic examples of systems processing Transactions. Highly structured data in these systems is typically stored in SQL databases.Interactions are about how people and things interact with each other or with your business. Web Logs, User Click Streams, Social Interactions & Feeds, and User-Generated Content are classic places to find Interaction data.Observational data tends to come from the “Internet of Things”. Sensors for heat, motion, pressure and RFID and GPS chips within such things as mobile devices, ATM machines, and even aircraft engines provide just some examples of “things” that output Observation data.Most folks would agree that video is “big” data. The analysis of what’s happening in that video (ie. What you, me, and others are doing in the video) may not be “big” but it is valuable and it does fit under our umbrella.Moreover, business data feeds and publicly available data sets are also “big data”.So we should not minimize our thinking to just data that flows through an organization.Ex. The mortgage-related data you may have COULD benefit from being blended with external data found in Zillow, for example.The government, for example, has the Open Data Initiative. Which means that more and more data is being made publicly available.One of the use cases I find interesting is the Predictive Policing use case where state/local law enforcement is using analytics appied to crime databases and other publicly available data to help predict where and when pockets of crime might be springing up. These proactive analytics efforts have yielded real reductions in crime!Anyhow, this is what Big Data means to me…hopefully it makes sense to you.
  5. The market drivers for big data span Business, Technical, and Financial.From a business perspective, the promise of big data is to find insights for competitive advantage, enable new business models, or optimize existing models. From a technical perspective, as we discussed, volumes of data continue to grow and data is very multi-striuctured in nature which poses a challenge for traditional systems that have inherently assumed relational row/column structure.And from a financial perspective, while the cost of data systems continues to grow, the rise of commodity hardware and open source platforms like Hadoop are enabling an economic model that makes it possible to gather large volumes in one place to be processed in a way that does not break the bank.So, we’ve covered an overview of big data and the market drivers behind why it’s important. Your CIO, like many these days, believes it’s a top 3 initiative and has tasked you with coming up with a strategy.
  6. So how many feel like this poor guy getting started with his big data strategy?Well, let’s start off with a look at a next-generation data architecture that leverages new platforms like Hadoop in a way that integrates with your existing systems.
  7. So I’d like to talk about how Hadoop can fit within broader enterprise data architecture with the goal of maximizing the value from ALL of your data: transactions + interactions + observations. At the highest level, I see three broad areas of data processing: Business Transactions & Interactions Business Intelligence & Analytics Big Data RefineryEnterprise IT has been connecting systems via classic ETL processing, as illustrated in Step 1 above, for many years in order to deliver structured and repeatable analysis. In this step, the business determines the questions to ask and IT collects and structures the data needed to answer those questions. The “Big Data Refinery”, as highlighted in Step 2, is a new system capable of storing, aggregating, and transforming a wide range of multi-structured raw data sources into usable formats that help fuel new insights for the business. The Big Data Refinery provides a cost-effective platform for unlocking the potential value within data and discovering the business questions worth answering with this data. A popular example of big data refining is processing Web logs, clickstreams, social interactions, social feeds, and other user generated data sources into more accurate assessments of customer churn or more effective creation of personalized offers. More interestingly, there are businesses deriving value from processing large video, audio, and image files. Retail stores, for example, are leveraging in-store video feeds to help them better understand how customers navigate the aisles as they find and purchase products. Retailers that provide optimized shopping paths and intelligent product placement within their stores are able to drive more revenue for the business. In this case, while the video files may be big in size, the refined output of the analysis is typically small in size but potentially big in value.With that as backdrop, Step 3 takes the model further by showing how the Big Data Refinery interacts with the systems powering Business Transactions & Interactions and Business Intelligence & Analytics. Interacting in this way opens up the ability for businesses to get a richer and more informed 360 ̊ view of customers, for example.By directly integrating the Big Data Refinery with existing Business Intelligence & Analytics solutions that contain much of the transactional information for the business, companies can enhance their ability to more accurately understand the customer behaviors that lead to the transactions.Moreover, systems focused on Business Transactions & Interactions can also benefit from connecting with the Big Data Refinery. Complex analytics and calculations of key parameters can be performed in the refinery and flow downstream to fuel runtime models powering business applications with the goal of more accurately targeting customers with the best and most relevant offers, for example.Since the Big Data Refinery is great at retaining large volumes of data for long periods of time, the model is completed with the feedback loops illustrated in Steps 4 and 5. Retaining the past 10 years of historical “Black Friday” retail data, for example, can benefit the business, especially if it’s blended with other data sources such as 10 years of weather data accessed from a third party data provider. The point here is that the opportunities for creating value from multi-structured data sources available inside and outside the enterprise are virtually endless if you have a platform that can do it cost effectively and at scale.
  8. So enterprise Hadoop lies at the heart of the next-generation data architecture.Let’s outline what’s required in and around Hadoop in order to make it easy to use and consume by the enterprise.At the center, we start with Apache Hadoop for distributed file storage and processing (a la MapReduce).In order to enable Hadoop within mainstream enterprises, we need to address enterprise concerns such as high availability, disaster recovery, snapshots, security, etc. And on top of this, we need to provide data services that make it easy to move data in and out of the platform, process and transform the data into useful formats, and enable people and other systems to access the data easily.This is where components like Apache Hive, Pig, HBase, HCatalog, and other tools fit.Making it easy for data workers is important, but it’s also important to make the platform easier to operate.Components like Apache Ambari that address provisioning, management and monitoring of the cluster are important here.So all of that: Core and Platform Services, Data Services, and Operational Services all come together into a vision of “enterprise Hadoop”.Ensuring that Enterprise Hadoop Platform can be flexibly deployed across operating systems and virtual environments like Linux, Windows, and Vmware is important.Targeting Cloud environments like Amazon Web Services, Microsoft Azure, Rackspace OpenCloud, and OpenStack is increasingly important.As is the ability to provide enterprise Hadoop pre-configured within a Hardware appliance like Teradata’s Big Analytics Appliance helps pull Hadoo into enterprises as well.
  9. While overly simplistic, this graphic represents what we commonly see as a general data architecture:A set of data sources producing dataA set of data systems to capture and store that data: most typically a mix of RDBMS and data warehousesA set of applications that leverage the data stored in those data systems. These could be package BI applications (Business Objects, Tableau, etc), Enterprise Applications (e.g. SAP) or Custom Applications (e.g. custom web applications), ranging from ad-hoc reporting tools to mission-critical enterprise operations applications.Your environment is undoubtedly more complicated, but conceptually it is likely similar.
  10. As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets).Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with:Existing applications – such as Tableau, SAS, Business Objects, etc,Existing databases and data warehouses for loading data to / from the data warehouseDevelopment tools used for building custom applicationsOperational tools for managing and monitoring
  11. In October 2010, I attended the Hadoop World event in New York City where there was a keynote presentation by Larry Feinsmith of JP Morgan Chase. Larry provided great insight into how JP Morgan Chase was using Hadoop. Great creative use cases! But the point that stuck with me long after the event was the importance of figuring out how Hadoop can and should be integrated with existing IT investments. While Larry said he loves the innovation happening on the open source community, he also said that enterprises like JP Morgan Chase will not throw away all of their existing investments!They want ways that enable them to get the benefits of new technologies in ways that leverage existing skills and integrate with existing systems.
  12. It is for that reason that we focus on HDP interoperability across all of these categories:Data systemsHDP is endorsed and embedded with SQL Server, Teradata and moreBI tools: HDP is certified for use with the packaged applications you already use: from Microsoft, to Tableau, Microstrategy, Business Objects and moreWith Development tools: For .Net developers: Visual studio, used to build more than half the custom applications in the world, certifies with HDP to enable microsoft app developers to build custom apps with HadoopFor Java developers: Spring for Apache Hadoop enables Java developers to quickly and easily build Hadoop based applications with HDPOperational toolsIntegration with System Center, and with Teradata viewpoint
  13. So…if I haven’t made it crystal clear for you yet, maybe this visual will get the point across.Enterprise Hadoop makes a great tag team with you existing tools to enable a next-generation data architecture that positions you to refine and explore vast quantities of multistructured data and enrich your applications and services that drive your business.
  14. So we’ve covered the overall architecture and how Hadoop fits, let’s discuss the patterns of use that we’re seeing for using Hadoop.At a high level, we describe the 3 key patterns of use as Refine, Explore, and Enrich.Refine captures the data into the platform and transforms (or refines it) into the desired formats.Explore is about creating laks of data that you can interactively surf through to find valuable insights.Enrich is about leveraging analytics and models to influence your online applications, making them more intelligent.So while some categorize Hadoop as just a Batch platform, it is increasingly being used and evolving to serve a wide range of usage patterns that span Batch, Interactive, and Online needs.Let me cover these patterns in a little more detail.
  15. Across all of our user base, we have identified just 3 separate usage patterns – sometimes more than one is used in concert during a complex project, but the patterns are distinct nonetheless. These are Refine, Explore and Enrich.The first of these, the Refine case, is probably the most common today. It is about taking very large quantities of data and using Hadoop to distill the information down into a more manageable data set that can then be loaded into a traditional data warehouse for usage with existing tools. This is relatively straightforward and allows an organization to harness a much larger data set for their analytics applications while leveraging their existing data warehousing and analytics tools.Using the graphic here, in step 1 data is pulled from a variety of sources, into the Hadoop platform in step 2, and then in step 3 loaded into a data warehouse for analysis by existing BI tools
  16. A second use case is what we would refer to as Data Exploration – this is the use case in question most commonly when people talk about “Data Science”.In simplest terms, it is about using Hadoop as the primary data store rather than performing the secondary step of moving data into a data warehouse. To support this use case you’ve seen all the BI tool vendor rally to add support for Hadoop – and most commonly HDP – as a peer to the database and in so doing allow for rich analytics on extremely large datasets that would be both unwieldy and also costly in a traditional data warehouse. Hadoop allows for interaction with a much richer dataset and has spawned a whole new generation of analytics tools that rely on Hadoop (HDP) as the data store.To use the graphic, in step 1 data is pulled into HDP, it is stored and processed in Step 2, before being surfaced directly into the analytics tools for the end user in Step 3.
  17. The final use case is called Application Enrichment.This is about incorporating data stored in HDP to enrich an existing application. This could be an on-line application in which we want to surface custom information to a user based on their particular profile. For example: if a user has been searching the web for information on home renovations, in the context of your application you may want to use that knowledge to surface a custom offer for a product that you sell related to that category. Large web companies such as Facebook and others are very sophisticated in the use of this approach.In the diagram, this is about pulling data from disparate sources into HDP in Step 1, storing and processing it in Step 2, and then interacting with it directly from your applications in Step 3, typically in a bi-directional manner (e.g. request data, return data, store response).
  18. When all is said and done, the ultimate goal of big data processing is to optimize outcomes at scale. Geoffrey Moore, author of Crossing the Chasm, gave these good examples across various vertical industries.
  19. And speaking of Geoffrey Moore, let me close out by covering where Hadoop is from a crossing the chasm perspective.Based on our engagement with enterprise customers, we believe Hadoop has transitioned into the early majority and is therefore being used by more mainstream enterprises.Horizontal patterns of use emerge in this stage as well as what Geoffrey Moore calls “bowling pins” or vertical solutions.The net out is that enterprise Hadoop offers exciting promise, but it is still early in it maturity cycle. You can do a lot with the technology, but there’s more to do to harden it for broader mainstream adoptions.
  20. And with that, let me close out with the guiding vision we have at Hortonworks.