Hadoop Patterns of Use

•Télécharger en tant que PPTX, PDF•

9 j'aime•2,576 vues

DOWNLOAD the whitepaper here: http://hortonworks.com/wp-content/plugins/download-monitor/download.php?id=71 As an organization laser focused on developing, distributing and supporting Apache Hadoop for enterprise customers, we have been fortunate to have a unique vantage point. We’re delighted to share with you these slides and our new whitepaper ‘Apache Hadoop Patterns of Use’. The patterns discussed in the slides and whitepaper are: Refine: Collect data and apply a known algorithm to it in a trusted operational process. Explore: Collect data and perform iterative investigation for value. Enrich: Collect data, analyze and present salient results for online apps. We hope you enjoy the content.

Technologie Business

Hadoop Patterns of Use
April 2013

© Hortonworks Inc. 2013

Existing Data Architecture
APPLICATIONS

Business Custom Enterprise
Analytics Applications Applications
DEV & DATA
TOOLS

BUILD &
TEST
DATA SYSTEMS

OPERATIONAL
TOOLS

MANAGE &
RDBMS EDW MP MONITOR
TRADITIONAL REPOS P
DATA SOURCES

Traditional Sources
OLTP,(RDBMS, OLTP, OLAP)
POS
SYSTEMS

Next-Generation Data Architecture
APPLICATIONS

Business Custom Enterprise
Analytics Applications Applications
DEV & DATA
TOOLS

BUILD &
TEST
DATA SYSTEMS

OPERATIONAL
TOOLS
ENTERPRISE
MANAGE &
HADOOP PLATFORM MONITOR
RDBMS EDW MP
TRADITIONAL REPOS P
DATA SOURCES

Traditional Sources New Sources
OLTP,(RDBMS, OLTP, OLAP) (web logs, email, sensors, social media)
POS
SYSTEMS

Hadoop Common Patterns of Use
Business Cases

“Right-time” Access to Data
Batch Interactive Online

Refine Explore Enrich

HORTONWORKS
DATA PLATFORM

Big Data
Transactions, Interactions, Observations

Operational Data Refinery
Enric
Refine Explore
h
APPLICATIONS

Business Custom Enterprise Transform & refine ALL
Analytics Applications Applications sources of data

Also known as Data
Reservoir or Catch Basin
3
DATA SYSTEMS

HORTONWORKS
DATA PLATFORM 2 1 Capture
RDBMS EDW MPP
TRADITIONAL REPOS

2 Process
1
DATA SOURCES

Traditional Sources New Sources 3 Distribute & Retain
(RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

Big Data Exploration & Visualization
Refine Explore Enrich
APPLICATIONS

Business Custom Enterprise Leverage “data lake”
Analytics Applications Applications to perform iterative
investigation for value
3
DATA SYSTEMS

HORTONWORKS
DATA PLATFORM 2 1 Capture
RDBMS EDW MPP
TRADITIONAL REPOS

2 Process
1
DATA SOURCES

Traditional Sources New Sources 3 Explore & Visualize
(RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

Application Enrichment
Refine Explore Enrich
APPLICATIONS

Custom Enterprise Create intelligent
Applications Applications applications

3
Collect data, create
analytical models and
deliver to online apps
DATA SYSTEMS

HORTONWORKS
DATA PLATFORM 2 1 Capture
RDBMS EDW MPP NOSQL
TRADITIONAL REPOS

2 Process & Compute
1
DATA SOURCES

Traditional Sources New Sources 3 Deliver Model
(RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

Recommandé

Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks

Big Data Architectural PatternsAmazon Web Services

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks

IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks

Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks

Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks

Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks

HDF 3.2 - What's NewHortonworks

Recommandé

Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks

Big Data Architectural PatternsAmazon Web Services

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks

IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks

Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks

Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks

Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks

HDF 3.2 - What's NewHortonworks

Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks

Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks

IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks

Premier Inside-Out: Apache DruidHortonworks

Accelerating Data Science and Real Time Analytics at ScaleHortonworks

TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks

Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks

Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks

Making Enterprise Big Data Small with EaseHortonworks

Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks

Driving Digital Transformation Through Global Data ManagementHortonworks

HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks

Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks

Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks

4 Essential Steps for Managing Sensitive DataHortonworks

5 Steps to Create a Company Culture that Embraces the Power of DataHortonworks

Exploring the Heated-and Completely Unnecessary- Data Lake DebateHortonworks

Sprint's Data Modernization JourneyHortonworks

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks

Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Hortonworks

Advanced Computer Architecture – An IntroductionDilum Bandara

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Contenu connexe

Plus de Hortonworks

Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks

Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks

IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks

Premier Inside-Out: Apache DruidHortonworks

Accelerating Data Science and Real Time Analytics at ScaleHortonworks

TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks

Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks

Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks

Making Enterprise Big Data Small with EaseHortonworks

Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks

Driving Digital Transformation Through Global Data ManagementHortonworks

HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks

Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks

Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks

4 Essential Steps for Managing Sensitive DataHortonworks

5 Steps to Create a Company Culture that Embraces the Power of DataHortonworks

Exploring the Heated-and Completely Unnecessary- Data Lake DebateHortonworks

Sprint's Data Modernization JourneyHortonworks

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks

Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Hortonworks

Plus de Hortonworks (20)

Curing Kafka Blindness with Hortonworks Streams Messaging Manager

Interpretation Tool for Genomic Sequencing Data in Clinical Environments

IBM+Hortonworks = Transformation of the Big Data Landscape

Premier Inside-Out: Apache Druid

Accelerating Data Science and Real Time Analytics at Scale

TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA

Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...

Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense

Making Enterprise Big Data Small with Ease

Webinewbie to Webinerd in 30 Days - Webinar World Presentation

Driving Digital Transformation Through Global Data Management

HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features

Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...

Unlock Value from Big Data with Apache NiFi and Streaming CDC

4 Essential Steps for Managing Sensitive Data

5 Steps to Create a Company Culture that Embraces the Power of Data

Exploring the Heated-and Completely Unnecessary- Data Lake Debate

Sprint's Data Modernization Journey

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform

Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017

Dernier

Advanced Computer Architecture – An IntroductionDilum Bandara

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Gen AI in Business - Global Trends Report 2024.pdfAddepto

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

WordPress Websites for Engineers: Elevate Your Brandgvaughan

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Dernier (20)

Advanced Computer Architecture – An Introduction

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

The Ultimate Guide to Choosing WordPress Pros and Cons

Commit 2024 - Secret Management made easy

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Generative AI for Technical Writer or Information Developers

What's New in Teams Calling, Meetings and Devices March 2024

Unleash Your Potential - Namagunga Girls Coding Club

Developer Data Modeling Mistakes: From Postgres to NoSQL

Unraveling Multimodality with Large Language Models.pdf

Gen AI in Business - Global Trends Report 2024.pdf

"Debugging python applications inside k8s environment", Andrii Soldatenko

The State of Passkeys with FIDO Alliance.pptx

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

WordPress Websites for Engineers: Elevate Your Brand

DevoxxFR 2024 Reproducible Builds with Apache Maven

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Hadoop Patterns of Use

2. Existing Data Architecture APPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS OPERATIONAL TOOLS MANAGE & RDBMS EDW MP MONITOR TRADITIONAL REPOS P DATA SOURCES Traditional Sources OLTP,(RDBMS, OLTP, OLAP) POS SYSTEMS

3. Next-Generation Data Architecture APPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS OPERATIONAL TOOLS ENTERPRISE MANAGE & HADOOP PLATFORM MONITOR RDBMS EDW MP TRADITIONAL REPOS P DATA SOURCES Traditional Sources New Sources OLTP,(RDBMS, OLTP, OLAP) (web logs, email, sensors, social media) POS SYSTEMS

4. Hadoop Common Patterns of Use Business Cases “Right-time” Access to Data Batch Interactive Online Refine Explore Enrich HORTONWORKS DATA PLATFORM Big Data Transactions, Interactions, Observations

5. Operational Data Refinery Enric Refine Explore h APPLICATIONS Business Custom Enterprise Transform & refine ALL Analytics Applications Applications sources of data Also known as Data Reservoir or Catch Basin 3 DATA SYSTEMS HORTONWORKS DATA PLATFORM 2 1 Capture RDBMS EDW MPP TRADITIONAL REPOS 2 Process 1 DATA SOURCES Traditional Sources New Sources 3 Distribute & Retain (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

6. Big Data Exploration & Visualization Refine Explore Enrich APPLICATIONS Business Custom Enterprise Leverage “data lake” Analytics Applications Applications to perform iterative investigation for value 3 DATA SYSTEMS HORTONWORKS DATA PLATFORM 2 1 Capture RDBMS EDW MPP TRADITIONAL REPOS 2 Process 1 DATA SOURCES Traditional Sources New Sources 3 Explore & Visualize (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

7. Application Enrichment Refine Explore Enrich APPLICATIONS Custom Enterprise Create intelligent Applications Applications applications 3 Collect data, create analytical models and deliver to online apps DATA SYSTEMS HORTONWORKS DATA PLATFORM 2 1 Capture RDBMS EDW MPP NOSQL TRADITIONAL REPOS 2 Process & Compute 1 DATA SOURCES Traditional Sources New Sources 3 Deliver Model (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

8. follow us: @hortonworks

Notes de l'éditeur

While overly simplistic, this graphic represents what we commonly see as a general data architecture:A set of data sources producing dataA set of data systems to capture and store that data: most typically a mix of RDBMS and data warehousesA set of applications that leverage the data stored in those data systems. These could be package BI applications (Business Objects, Tableau, etc), Enterprise Applications (e.g. SAP) or Custom Applications (e.g. custom web applications), ranging from ad-hoc reporting tools to mission-critical enterprise operations applications.Your environment is undoubtedly more complicated, but conceptually it is likely similar.
As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets).Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with:Existing applications – such as Tableau, SAS, Business Objects, etc,Existing databases and data warehouses for loading data to / from the data warehouseDevelopment tools used for building custom applicationsOperational tools for managing and monitoring
So we’ve covered the overall architecture and how Hadoop fits, let’s discuss the patterns of use that we’re seeing for using Hadoop.At a high level, we describe the 3 key patterns of use as Refine, Explore, and Enrich.Refine captures the data into the platform and transforms (or refines it) into the desired formats.Explore is about creating laks of data that you can interactively surf through to find valuable insights.Enrich is about leveraging analytics and models to influence your online applications, making them more intelligent.So while some categorize Hadoop as just a Batch platform, it is increasingly being used and evolving to serve a wide range of usage patterns that span Batch, Interactive, and Online needs.Let me cover these patterns in a little more detail.
Across all of our user base, we have identified just 3 separate usage patterns – sometimes more than one is used in concert during a complex project, but the patterns are distinct nonetheless. These are Refine, Explore and Enrich.The first of these, the Refine case, is probably the most common today. It is about taking very large quantities of data and using Hadoop to distill the information down into a more manageable data set that can then be loaded into a traditional data warehouse for usage with existing tools. This is relatively straightforward and allows an organization to harness a much larger data set for their analytics applications while leveraging their existing data warehousing and analytics tools.Using the graphic here, in step 1 data is pulled from a variety of sources, into the Hadoop platform in step 2, and then in step 3 loaded into a data warehouse for analysis by existing BI tools
A second use case is what we would refer to as Data Exploration – this is the use case in question most commonly when people talk about “Data Science”.In simplest terms, it is about using Hadoop as the primary data store rather than performing the secondary step of moving data into a data warehouse. To support this use case you’ve seen all the BI tool vendor rally to add support for Hadoop – and most commonly HDP – as a peer to the database and in so doing allow for rich analytics on extremely large datasets that would be both unwieldy and also costly in a traditional data warehouse. Hadoop allows for interaction with a much richer dataset and has spawned a whole new generation of analytics tools that rely on Hadoop (HDP) as the data store.To use the graphic, in step 1 data is pulled into HDP, it is stored and processed in Step 2, before being surfaced directly into the analytics tools for the end user in Step 3.
The final use case is called Application Enrichment.This is about incorporating data stored in HDP to enrich an existing application. This could be an on-line application in which we want to surface custom information to a user based on their particular profile. For example: if a user has been searching the web for information on home renovations, in the context of your application you may want to use that knowledge to surface a custom offer for a product that you sell related to that category. Large web companies such as Facebook and others are very sophisticated in the use of this approach.In the diagram, this is about pulling data from disparate sources into HDP in Step 1, storing and processing it in Step 2, and then interacting with it directly from your applications in Step 3, typically in a bi-directional manner (e.g. request data, return data, store response).