SlideShare une entreprise Scribd logo
1  sur  8
Hadoop Patterns of Use
April 2013




© Hortonworks Inc. 2013
Existing Data Architecture
APPLICATIONS




                   Business                      Custom        Enterprise
                   Analytics                   Applications   Applications
                                                                             DEV & DATA
                                                                               TOOLS

                                                                               BUILD &
                                                                                TEST
DATA SYSTEMS




                                                                             OPERATIONAL
                                                                                TOOLS

                                                                             MANAGE &
                RDBMS      EDW            MP                                 MONITOR
                      TRADITIONAL REPOS    P
DATA SOURCES




                 Traditional Sources
                OLTP,(RDBMS,   OLTP, OLAP)
                 POS
               SYSTEMS
Next-Generation Data Architecture
APPLICATIONS




                   Business                      Custom                Enterprise
                   Analytics                   Applications           Applications
                                                                                               DEV & DATA
                                                                                                 TOOLS

                                                                                                 BUILD &
                                                                                                  TEST
DATA SYSTEMS




                                                                                               OPERATIONAL
                                                                                                  TOOLS
                                                                      ENTERPRISE
                                                                                               MANAGE &
                                                                      HADOOP PLATFORM          MONITOR
                RDBMS      EDW            MP
                      TRADITIONAL REPOS    P
DATA SOURCES




                 Traditional Sources                           New Sources
                OLTP,(RDBMS,   OLTP, OLAP)          (web logs, email, sensors, social media)
                 POS
               SYSTEMS
Hadoop Common Patterns of Use
            Business Cases


           “Right-time” Access to Data
         Batch          Interactive       Online



        Refine        Explore           Enrich

                   HORTONWORKS
                   DATA PLATFORM



                      Big Data
         Transactions, Interactions, Observations
Operational Data Refinery
                                                                                                                         Enric
                                                                                                      Refine   Explore
                                                                                                                          h
APPLICATIONS




               Business                     Custom                   Enterprise                   Transform & refine ALL
               Analytics                  Applications              Applications                  sources of data

                                                                                                  Also known as Data
                                                                                                  Reservoir or Catch Basin
                                                     3
DATA SYSTEMS




                                                                             HORTONWORKS
                                                                             DATA PLATFORM    2   1   Capture
                RDBMS      EDW           MPP
                     TRADITIONAL REPOS




                                                                                                  2   Process
                                                                            1
DATA SOURCES




               Traditional Sources                          New Sources                           3   Distribute & Retain
                (RDBMS, OLTP, OLAP)            (web logs, email, sensor data, social media)
Big Data Exploration & Visualization
                                                                                                      Refine   Explore   Enrich
APPLICATIONS




               Business                     Custom                   Enterprise                   Leverage “data lake”
               Analytics                  Applications              Applications                  to perform iterative
                                                                                                  investigation for value
                                                                   3
DATA SYSTEMS




                                                                             HORTONWORKS
                                                                             DATA PLATFORM    2   1   Capture
                RDBMS      EDW           MPP
                     TRADITIONAL REPOS




                                                                                                  2   Process
                                                                            1
DATA SOURCES




               Traditional Sources                          New Sources                           3   Explore & Visualize
                (RDBMS, OLTP, OLAP)            (web logs, email, sensor data, social media)
Application Enrichment
                                                                                                      Refine   Explore   Enrich
APPLICATIONS




                                            Custom                   Enterprise                   Create intelligent
                                          Applications              Applications                  applications

                                                            3
                                                                                                  Collect data, create
                                                                                                  analytical models and
                                                                                                  deliver to online apps
DATA SYSTEMS




                                                                             HORTONWORKS
                                                                             DATA PLATFORM    2   1   Capture
                RDBMS     EDW            MPP        NOSQL
                     TRADITIONAL REPOS




                                                                                                  2   Process & Compute
                                                                            1
DATA SOURCES




               Traditional Sources                          New Sources                           3   Deliver Model
                (RDBMS, OLTP, OLAP)            (web logs, email, sensor data, social media)
follow us: @hortonworks

Contenu connexe

Plus de Hortonworks

Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive DataHortonworks
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of DataHortonworks
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateHortonworks
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization JourneyHortonworks
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Hortonworks
 

Plus de Hortonworks (20)

Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
 

Dernier

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Hadoop Patterns of Use

  • 1. Hadoop Patterns of Use April 2013 © Hortonworks Inc. 2013
  • 2. Existing Data Architecture APPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS OPERATIONAL TOOLS MANAGE & RDBMS EDW MP MONITOR TRADITIONAL REPOS P DATA SOURCES Traditional Sources OLTP,(RDBMS, OLTP, OLAP) POS SYSTEMS
  • 3. Next-Generation Data Architecture APPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS OPERATIONAL TOOLS ENTERPRISE MANAGE & HADOOP PLATFORM MONITOR RDBMS EDW MP TRADITIONAL REPOS P DATA SOURCES Traditional Sources New Sources OLTP,(RDBMS, OLTP, OLAP) (web logs, email, sensors, social media) POS SYSTEMS
  • 4. Hadoop Common Patterns of Use Business Cases “Right-time” Access to Data Batch Interactive Online Refine Explore Enrich HORTONWORKS DATA PLATFORM Big Data Transactions, Interactions, Observations
  • 5. Operational Data Refinery Enric Refine Explore h APPLICATIONS Business Custom Enterprise Transform & refine ALL Analytics Applications Applications sources of data Also known as Data Reservoir or Catch Basin 3 DATA SYSTEMS HORTONWORKS DATA PLATFORM 2 1 Capture RDBMS EDW MPP TRADITIONAL REPOS 2 Process 1 DATA SOURCES Traditional Sources New Sources 3 Distribute & Retain (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)
  • 6. Big Data Exploration & Visualization Refine Explore Enrich APPLICATIONS Business Custom Enterprise Leverage “data lake” Analytics Applications Applications to perform iterative investigation for value 3 DATA SYSTEMS HORTONWORKS DATA PLATFORM 2 1 Capture RDBMS EDW MPP TRADITIONAL REPOS 2 Process 1 DATA SOURCES Traditional Sources New Sources 3 Explore & Visualize (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)
  • 7. Application Enrichment Refine Explore Enrich APPLICATIONS Custom Enterprise Create intelligent Applications Applications applications 3 Collect data, create analytical models and deliver to online apps DATA SYSTEMS HORTONWORKS DATA PLATFORM 2 1 Capture RDBMS EDW MPP NOSQL TRADITIONAL REPOS 2 Process & Compute 1 DATA SOURCES Traditional Sources New Sources 3 Deliver Model (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

Notes de l'éditeur

  1. While overly simplistic, this graphic represents what we commonly see as a general data architecture:A set of data sources producing dataA set of data systems to capture and store that data: most typically a mix of RDBMS and data warehousesA set of applications that leverage the data stored in those data systems. These could be package BI applications (Business Objects, Tableau, etc), Enterprise Applications (e.g. SAP) or Custom Applications (e.g. custom web applications), ranging from ad-hoc reporting tools to mission-critical enterprise operations applications.Your environment is undoubtedly more complicated, but conceptually it is likely similar.
  2. As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets).Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with:Existing applications – such as Tableau, SAS, Business Objects, etc,Existing databases and data warehouses for loading data to / from the data warehouseDevelopment tools used for building custom applicationsOperational tools for managing and monitoring
  3. So we’ve covered the overall architecture and how Hadoop fits, let’s discuss the patterns of use that we’re seeing for using Hadoop.At a high level, we describe the 3 key patterns of use as Refine, Explore, and Enrich.Refine captures the data into the platform and transforms (or refines it) into the desired formats.Explore is about creating laks of data that you can interactively surf through to find valuable insights.Enrich is about leveraging analytics and models to influence your online applications, making them more intelligent.So while some categorize Hadoop as just a Batch platform, it is increasingly being used and evolving to serve a wide range of usage patterns that span Batch, Interactive, and Online needs.Let me cover these patterns in a little more detail.
  4. Across all of our user base, we have identified just 3 separate usage patterns – sometimes more than one is used in concert during a complex project, but the patterns are distinct nonetheless. These are Refine, Explore and Enrich.The first of these, the Refine case, is probably the most common today. It is about taking very large quantities of data and using Hadoop to distill the information down into a more manageable data set that can then be loaded into a traditional data warehouse for usage with existing tools. This is relatively straightforward and allows an organization to harness a much larger data set for their analytics applications while leveraging their existing data warehousing and analytics tools.Using the graphic here, in step 1 data is pulled from a variety of sources, into the Hadoop platform in step 2, and then in step 3 loaded into a data warehouse for analysis by existing BI tools
  5. A second use case is what we would refer to as Data Exploration – this is the use case in question most commonly when people talk about “Data Science”.In simplest terms, it is about using Hadoop as the primary data store rather than performing the secondary step of moving data into a data warehouse. To support this use case you’ve seen all the BI tool vendor rally to add support for Hadoop – and most commonly HDP – as a peer to the database and in so doing allow for rich analytics on extremely large datasets that would be both unwieldy and also costly in a traditional data warehouse. Hadoop allows for interaction with a much richer dataset and has spawned a whole new generation of analytics tools that rely on Hadoop (HDP) as the data store.To use the graphic, in step 1 data is pulled into HDP, it is stored and processed in Step 2, before being surfaced directly into the analytics tools for the end user in Step 3.
  6. The final use case is called Application Enrichment.This is about incorporating data stored in HDP to enrich an existing application. This could be an on-line application in which we want to surface custom information to a user based on their particular profile. For example: if a user has been searching the web for information on home renovations, in the context of your application you may want to use that knowledge to surface a custom offer for a product that you sell related to that category. Large web companies such as Facebook and others are very sophisticated in the use of this approach.In the diagram, this is about pulling data from disparate sources into HDP in Step 1, storing and processing it in Step 2, and then interacting with it directly from your applications in Step 3, typically in a bi-directional manner (e.g. request data, return data, store response).