Hadoop's Role in Enterprise Architecture

•

35 j'aime•7,382 vues

With the rise of Apache Hadoop, a next-generation enterprise data architecture is emerging that connects the systems powering business transactions and business intelligence. Hadoop is uniquely capable of storing, aggregating, and refining multi-structured data sources into formats that fuel new business insights. Organizations that embrace solution architectures focused on maximizing the value from ALL data will put themselves in a position to drive more business, enhance productivity, or discover new and lucrative business opportunities. Over the coming years, Hadoop could be in a position to process more than half the world’s data. There is still much work to be done, however, if Hadoop is to achieve this lofty goal. In this talk Shaun Connolly, VP Corporate Strategy for Hortonworks, will look at Hadoop’s role in the enterprise architecture and how it compliments existing enterprise systems.

Technologie

Hadoop’s Role in the
Enterprise Architecture
Shaun Connolly
Hortonworks VP Strategy
@shaunconnolly

Transactions

Interactions

Observations

What is Big Data?
Transactions + Interactions
Petabytes
BIG DATA Mobile Web + Observations
Sentiment SMS/MMS

User Click Stream
= BIG DATA
Speech to Text

Social Interactions & Feeds
Terabytes WEB Web logs
Spatial & GPS Coordinates
A/B testing
Sensors / RFID / Devices
Behavioral Targeting
Gigabytes CRM Business Data Feeds
Dynamic Pricing
Segmentation External Demographics
Search Marketing
Customer Touches User Generated Content
ERP
Megabytes Affiliate Networks
Purchase detail Support Contacts HD Video, Audio, Images
Dynamic Funnels
Purchase record
Offer details Offer history Product/Service Logs
Payment record

Increasing Data Variety and Complexity

Big Data Market Drivers
Business
1 Enable new business models & drive faster growth (20%+)

2 Find insights for competitive advantage & optimal returns

Technical
3 Data continues to grow exponentially

4 Data is increasingly everywhere and in many formats

5 Traditional solutions not designed for new requirements

Financial
6 Cost of data systems, as % of IT spend, continues to grow

7 Cost advantages of commodity hardware & open source

Is This Your Big Data Strategy?

BIG DATA

you

Next-Generation Data Architecture

Unstructured Business CRM, ERP
Data
Transactions Web, Mobile
& Interactions Point of sale
Log files Enterprise
Hadoop
Exhaust Data
Platform
Classic Data
Integration & ETL
Social Media

Sensors,
devices Business Dashboards,
Intelligence Reports,
& Analytics Visualization, …
DB data

1 Capture Big Data 2 Process & Structure 3 Distribute Results 4 Feedback & Retain

Making Hadoop Enterprise Ready

OPERATIONAL DATA
SERVICES SERVICES
Manage & Store,
Operate at Process and
Scale Access Data

Distributed
HADOOP CORE Storage & Processing

Enterprise Readiness: HA,
PLATFORM SERVICES DR, Snapshots, Security, …

ENTERPRISE HADOOP PLATFORM

OS / VM Cloud Appliance

Existing Data Architecture
APPLICATIONS

Business Custom Enterprise
Analytics Applications Applications
DEV & DATA
TOOLS

BUILD &
TEST
DATA SYSTEMS

OPERATIONAL
TOOLS

MANAGE &
RDBMS EDW MP MONITOR
TRADITIONAL REPOS P
DATA SOURCES

Traditional Sources
OLTP,(RDBMS, OLTP, OLAP)
POS
SYSTEMS

An Emerging Data Architecture
APPLICATIONS

Business Custom Enterprise
Analytics Applications Applications
DEV & DATA
TOOLS

BUILD &
TEST
DATA SYSTEMS

OPERATIONAL
TOOLS
ENTERPRISE
MANAGE &
HADOOP PLATFORM MONITOR
RDBMS EDW MP
TRADITIONAL REPOS P
DATA SOURCES

Traditional Sources New Sources
OLTP,(RDBMS, OLTP, OLAP) (web logs, email, sensors, social media)
MOBILE
POS DATA
SYSTEMS

[Integrating Hadoop with
existing IT investments is
vitally important.]
Larry Feinsmith

Interoperating With Your Tools
APPLICATIONS

Microsoft Applications
DEV & DATA
TOOLS
DATA SYSTEMS

OPERATIONAL
TOOLS
ENTERPRISE
HADOOP PLATFORM
TRADITIONAL REPOS Viewpoint
DATA SOURCES

Traditional Sources New Sources
OLTP,(RDBMS, OLTP, OLAP) (web logs, email, sensors, social media)
MOBILE
POS DATA
SYSTEMS

Big Data Tag Team!

Your Enterprise
Tools Hadoop

Hadoop Common Patterns of Use
Business Cases

“Right-time” Access to Data
Batch Interactive Online

Refine Explore Enrich

ENTERPRISE
HADOOP PLATFORM

Big Data
Transactions, Interactions, Observations

Operational Data Refinery
Enric
Refine Explore
h
APPLICATIONS

Business Custom Enterprise Transform & refine ALL
Analytics Applications Applications sources of data

Also known as Data
Reservoir or Catch Basin
3
DATA SYSTEMS

ENTERPRISE
HADOOP
2 1 Capture
RDBMS EDW MPP PLATFORM
TRADITIONAL REPOS

2 Process
1
DATA SOURCES

Traditional Sources New Sources 3 Distribute & Retain
(RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

Big Data Exploration & Visualization
Refine Explore Enrich
APPLICATIONS

Business Custom Enterprise Leverage “data lake”
Analytics Applications Applications to perform iterative
investigation for value
3
DATA SYSTEMS

ENTERPRISE
HADOOP
2 1 Capture
RDBMS EDW MPP PLATFORM
TRADITIONAL REPOS

2 Process
1
DATA SOURCES

Traditional Sources New Sources 3 Explore & Visualize
(RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

Application Enrichment
Refine Explore Enrich
APPLICATIONS

Custom Enterprise Create intelligent
Applications Applications applications

3
Collect data, create
analytical models and
deliver to online apps
DATA SYSTEMS

ENTERPRISE
HADOOP
2 1 Capture
RDBMS EDW MPP NOSQL PLATFORM
TRADITIONAL REPOS

2 Process & Compute
1
DATA SOURCES

Traditional Sources New Sources 3 Deliver Model
(RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

Big Data: Optimize Outcomes at Scale
Media o p ti m i z e Content
Intelligence o p ti m i z e Detection
Finance o p ti m i z e Algorithms
Advertising o p ti m i z e Performance
Fraud o p ti m i z e Prevention
Retail / Wholesale o p ti m i z e Inventory turns
Manufacturing o p ti m i z e Supply chains
Healthcare o p ti m i z e Patient outcomes
Education o p ti m i z e Learning outcomes
Government o p ti m i z e Citizen services
Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.

Market Transitioning into Early Majority
relative %
customers

The CHASM
Innovators, Early Early
Late majority, Laggards,
technology adopters, majority,
conservatives Skeptics
enthusiasts visionaries pragmatists

time
Customers want Customers want
technology & performance solutions & convenience

Source: Geoffrey Moore - Crossing the Chasm

At Hortonworks, we believe that by the end
of 2015, more than half the world's data
will be processed by Apache Hadoop.

Welcome to Hadoop Summit and
Enjoy the Conference!

Contenu connexe

En vedette

Software Quality Planguy_davis

AWS를 활용한 미디어 스트리밍 서비스Amazon Web Services Korea

Fast+plants+essayjespinal5

Hematology learning guide Fidaa Jaafrah

Furan Testing of Transformers OilNitish Kumar

2015 Largest Healthcare Staffing Firms in the USBrian Snyder

Cách làm Email marketing thành công!missbik

Cowboy tools and attireChristianN2T

Selenium at Salesforce ScaleSalesforce Engineering

Sustainable LeadershipLaura Pasquini

Effect of electrolytes on cardiac rhythmAhmad Thanin

Icons and Stencils for HadoopHortonworks

En vedette (12)

Software Quality Plan

AWS를 활용한 미디어 스트리밍 서비스

Fast+plants+essay

Hematology learning guide

Furan Testing of Transformers Oil

2015 Largest Healthcare Staffing Firms in the US

Cách làm Email marketing thành công!

Cowboy tools and attire

Selenium at Salesforce Scale

Sustainable Leadership

Effect of electrolytes on cardiac rhythm

Icons and Stencils for Hadoop

Plus de DataWorks Summit

Data Science Crash CourseDataWorks Summit

Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit

HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit

Managing the Dewey Decimal SystemDataWorks Summit

Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit

HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit

Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit

Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit

Security Framework for Multitenant ArchitectureDataWorks Summit

Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit

Extending Twitter's Data Platform to Google CloudDataWorks Summit

Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit

Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit

Computer Vision: Coming to a Store Near YouDataWorks Summit

Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit

Plus de DataWorks Summit (20)

Data Science Crash Course

Floating on a RAFT: HBase Durability with Apache Ratis

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi

HBase Tales From the Trenches - Short stories about most common HBase operati...

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...

Managing the Dewey Decimal System

Practical NoSQL: Accumulo's dirlist Example

HBase Global Indexing to support large-scale data ingestion at Uber

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Supporting Apache HBase : Troubleshooting and Supportability Improvements

Security Framework for Multitenant Architecture

Presto: Optimizing Performance of SQL-on-Anything Engine

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

Extending Twitter's Data Platform to Google Cloud

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

Computer Vision: Coming to a Store Near You

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Dernier

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Evaluating the top large language models.pdfChristopherTHyatt

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Dernier (20)

presentation ICT roal in 21st century education

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

08448380779 Call Girls In Civil Lines Women Seeking Men

What Are The Drone Anti-jamming Systems Technology?

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

How to Troubleshoot Apps for the Modern Connected Worker

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

2024: Domino Containers - The Next Step. News from the Domino Container commu...

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Driving Behavioral Change for Information Management through Data-Driven Gree...

GenCyber Cyber Security Day Presentation

Boost PC performance: How more available memory can improve productivity

Strategies for Landing an Oracle DBA Job as a Fresher

Boost Fertility New Invention Ups Success Rates.pdf

Axa Assurance Maroc - Insurer Innovation Award 2024

Evaluating the top large language models.pdf

How to Troubleshoot Apps for the Modern Connected Worker

Hadoop's Role in Enterprise Architecture

1. Hadoop’s Role in the Enterprise Architecture Shaun Connolly Hortonworks VP Strategy @shaunconnolly

2. What is Big Data? What is Big Data?

3. Transactions Interactions Observations

4. What is Big Data? Transactions + Interactions Petabytes BIG DATA Mobile Web + Observations Sentiment SMS/MMS User Click Stream = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity

5. Big Data Market Drivers Business 1 Enable new business models & drive faster growth (20%+) 2 Find insights for competitive advantage & optimal returns Technical 3 Data continues to grow exponentially 4 Data is increasingly everywhere and in many formats 5 Traditional solutions not designed for new requirements Financial 6 Cost of data systems, as % of IT spend, continues to grow 7 Cost advantages of commodity hardware & open source

6. Is This Your Big Data Strategy? BIG DATA you

7. Next-Generation Data Architecture Unstructured Business CRM, ERP Data Transactions Web, Mobile & Interactions Point of sale Log files Enterprise Hadoop Exhaust Data Platform Classic Data Integration & ETL Social Media Sensors, devices Business Dashboards, Intelligence Reports, & Analytics Visualization, … DB data 1 Capture Big Data 2 Process & Structure 3 Distribute Results 4 Feedback & Retain

8. Making Hadoop Enterprise Ready OPERATIONAL DATA SERVICES SERVICES Manage & Store, Operate at Process and Scale Access Data Distributed HADOOP CORE Storage & Processing Enterprise Readiness: HA, PLATFORM SERVICES DR, Snapshots, Security, … ENTERPRISE HADOOP PLATFORM OS / VM Cloud Appliance

9. Existing Data Architecture APPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS OPERATIONAL TOOLS MANAGE & RDBMS EDW MP MONITOR TRADITIONAL REPOS P DATA SOURCES Traditional Sources OLTP,(RDBMS, OLTP, OLAP) POS SYSTEMS

10. An Emerging Data Architecture APPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TEST DATA SYSTEMS OPERATIONAL TOOLS ENTERPRISE MANAGE & HADOOP PLATFORM MONITOR RDBMS EDW MP TRADITIONAL REPOS P DATA SOURCES Traditional Sources New Sources OLTP,(RDBMS, OLTP, OLAP) (web logs, email, sensors, social media) MOBILE POS DATA SYSTEMS

11. [Integrating Hadoop with existing IT investments is vitally important.] Larry Feinsmith

12. Interoperating With Your Tools APPLICATIONS Microsoft Applications DEV & DATA TOOLS DATA SYSTEMS OPERATIONAL TOOLS ENTERPRISE HADOOP PLATFORM TRADITIONAL REPOS Viewpoint DATA SOURCES Traditional Sources New Sources OLTP,(RDBMS, OLTP, OLAP) (web logs, email, sensors, social media) MOBILE POS DATA SYSTEMS

13. Big Data Tag Team! Your Enterprise Tools Hadoop

14. Hadoop Common Patterns of Use Business Cases “Right-time” Access to Data Batch Interactive Online Refine Explore Enrich ENTERPRISE HADOOP PLATFORM Big Data Transactions, Interactions, Observations

15. Operational Data Refinery Enric Refine Explore h APPLICATIONS Business Custom Enterprise Transform & refine ALL Analytics Applications Applications sources of data Also known as Data Reservoir or Catch Basin 3 DATA SYSTEMS ENTERPRISE HADOOP 2 1 Capture RDBMS EDW MPP PLATFORM TRADITIONAL REPOS 2 Process 1 DATA SOURCES Traditional Sources New Sources 3 Distribute & Retain (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

16. Big Data Exploration & Visualization Refine Explore Enrich APPLICATIONS Business Custom Enterprise Leverage “data lake” Analytics Applications Applications to perform iterative investigation for value 3 DATA SYSTEMS ENTERPRISE HADOOP 2 1 Capture RDBMS EDW MPP PLATFORM TRADITIONAL REPOS 2 Process 1 DATA SOURCES Traditional Sources New Sources 3 Explore & Visualize (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

17. Application Enrichment Refine Explore Enrich APPLICATIONS Custom Enterprise Create intelligent Applications Applications applications 3 Collect data, create analytical models and deliver to online apps DATA SYSTEMS ENTERPRISE HADOOP 2 1 Capture RDBMS EDW MPP NOSQL PLATFORM TRADITIONAL REPOS 2 Process & Compute 1 DATA SOURCES Traditional Sources New Sources 3 Deliver Model (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)

18. Big Data: Optimize Outcomes at Scale Media o p ti m i z e Content Intelligence o p ti m i z e Detection Finance o p ti m i z e Algorithms Advertising o p ti m i z e Performance Fraud o p ti m i z e Prevention Retail / Wholesale o p ti m i z e Inventory turns Manufacturing o p ti m i z e Supply chains Healthcare o p ti m i z e Patient outcomes Education o p ti m i z e Learning outcomes Government o p ti m i z e Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.

19. Market Transitioning into Early Majority relative % customers The CHASM Innovators, Early Early Late majority, Laggards, technology adopters, majority, conservatives Skeptics enthusiasts visionaries pragmatists time Customers want Customers want technology & performance solutions & convenience Source: Geoffrey Moore - Crossing the Chasm

20. At Hortonworks, we believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop. Welcome to Hadoop Summit and Enjoy the Conference!

Notes de l'éditeur

Title: Hadoop's Role in the Enterprise ArchitectureWith the rise of Apache Hadoop, a next-generation enterprise data architecture is emerging that connects the systems powering business transactions and business intelligence. Hadoop is uniquely capable of storing, aggregating, and refining multi-structured data sources into formats that fuel new business insights. Organizations that embrace solution architectures focused on maximizing the value from ALL data will put themselves in a position to drive more business, enhance productivity, or discover new and lucrative business opportunities. Over the coming years, Hadoop could be in a position to process more than half the world's data. There is still much work to be done, however, if Hadoop is to achieve this lofty goal. In this talk Shaun Connolly, VP Corporate Strategy for Hortonworks, will look at Hadoop's role in the enterprise architecture and how it compliments existing enterprise systems.
Thank you all for attending Hadoop Summit! I’d like to spend the next 30 minutes focused on Hadoop’s opportunity to power next-generation data architectures. I’ve been involved in open source for many years, having worked at JBoss back in 2004, then at Red Hat through 2008. After that I joined SpringSource and ultimately VMware through 2011. So I’ve seen a lot of open source technologies and waves of excitement and passionate users. But I’ve not seen anything quite like this Big Bata and Hadoop phenomenon.
So our backdrop is BIG DATA.GARTNER REPORT ON 12 October 2012: http://www.gartner.com/id=2195915Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016Big data has become a major driver of IT spending. The benefits to organizations for adding big data to their information management and analytics infrastructure will force a more rapid cycle of replacing existing solutions.IDC study:http://cdn.idc.com/research/Predictions12/Main/downloads/IDCTOP10Predictions2012.pdfIDC projects that the digital universe will reach 40 zettabytes (ZB) by 2020, resulting in a 50-fold growth from the beginning of 2010According to the study, 2.8ZB of data will have been created and replicated in 2012.Machine-generated data is a key driver in the growth of the world’s data – which is projected to increase 15x by 2020.So the topic of big data is increasingly important….but like any presentation these days about Big Data, we’ve got to start off with a definition, right?I kinda like to describe Big Data using a simple equation.As I see it, Big Data = Transactions + Interactions + ObservationsMeaning, it not only spans your current highly structured transactional data sources, it includes new forms of data that represent interactions (i.e. website interactions, social interactions, etc.) and observations (i.e data coming off of sensors, devices, etc.)So, for all the burgeoning data scientists in the audience…there’s your equation!
For the visual thinkers out there, let’s expand our mathematical model to show some concrete examples.ERP, SCM, CRM, and transactional Web applications are classic examples of systems processing Transactions. Highly structured data in these systems is typically stored in SQL databases.Interactions are about how people and things interact with each other or with your business. Web Logs, User Click Streams, Social Interactions & Feeds, and User-Generated Content are classic places to find Interaction data.Observational data tends to come from the “Internet of Things”. Sensors for heat, motion, pressure and RFID and GPS chips within such things as mobile devices, ATM machines, and even aircraft engines provide just some examples of “things” that output Observation data.Most folks would agree that video is “big” data. The analysis of what’s happening in that video (ie. What you, me, and others are doing in the video) may not be “big” but it is valuable and it does fit under our umbrella.Moreover, business data feeds and publicly available data sets are also “big data”.So we should not minimize our thinking to just data that flows through an organization.Ex. The mortgage-related data you may have COULD benefit from being blended with external data found in Zillow, for example.The government, for example, has the Open Data Initiative. Which means that more and more data is being made publicly available.One of the use cases I find interesting is the Predictive Policing use case where state/local law enforcement is using analytics appied to crime databases and other publicly available data to help predict where and when pockets of crime might be springing up. These proactive analytics efforts have yielded real reductions in crime!Anyhow, this is what Big Data means to me…hopefully it makes sense to you.
The market drivers for big data span Business, Technical, and Financial.From a business perspective, the promise of big data is to find insights for competitive advantage, enable new business models, or optimize existing models. From a technical perspective, as we discussed, volumes of data continue to grow and data is very multi-striuctured in nature which poses a challenge for traditional systems that have inherently assumed relational row/column structure.And from a financial perspective, while the cost of data systems continues to grow, the rise of commodity hardware and open source platforms like Hadoop are enabling an economic model that makes it possible to gather large volumes in one place to be processed in a way that does not break the bank.So, we’ve covered an overview of big data and the market drivers behind why it’s important. Your CIO, like many these days, believes it’s a top 3 initiative and has tasked you with coming up with a strategy.
So how many feel like this poor guy getting started with his big data strategy?Well, let’s start off with a look at a next-generation data architecture that leverages new platforms like Hadoop in a way that integrates with your existing systems.
So I’d like to talk about how Hadoop can fit within broader enterprise data architecture with the goal of maximizing the value from ALL of your data: transactions + interactions + observations. At the highest level, I see three broad areas of data processing: Business Transactions & Interactions Business Intelligence & Analytics Big Data RefineryEnterprise IT has been connecting systems via classic ETL processing, as illustrated in Step 1 above, for many years in order to deliver structured and repeatable analysis. In this step, the business determines the questions to ask and IT collects and structures the data needed to answer those questions. The “Big Data Refinery”, as highlighted in Step 2, is a new system capable of storing, aggregating, and transforming a wide range of multi-structured raw data sources into usable formats that help fuel new insights for the business. The Big Data Refinery provides a cost-effective platform for unlocking the potential value within data and discovering the business questions worth answering with this data. A popular example of big data refining is processing Web logs, clickstreams, social interactions, social feeds, and other user generated data sources into more accurate assessments of customer churn or more effective creation of personalized offers. More interestingly, there are businesses deriving value from processing large video, audio, and image files. Retail stores, for example, are leveraging in-store video feeds to help them better understand how customers navigate the aisles as they find and purchase products. Retailers that provide optimized shopping paths and intelligent product placement within their stores are able to drive more revenue for the business. In this case, while the video files may be big in size, the refined output of the analysis is typically small in size but potentially big in value.With that as backdrop, Step 3 takes the model further by showing how the Big Data Refinery interacts with the systems powering Business Transactions & Interactions and Business Intelligence & Analytics. Interacting in this way opens up the ability for businesses to get a richer and more informed 360 ̊ view of customers, for example.By directly integrating the Big Data Refinery with existing Business Intelligence & Analytics solutions that contain much of the transactional information for the business, companies can enhance their ability to more accurately understand the customer behaviors that lead to the transactions.Moreover, systems focused on Business Transactions & Interactions can also benefit from connecting with the Big Data Refinery. Complex analytics and calculations of key parameters can be performed in the refinery and flow downstream to fuel runtime models powering business applications with the goal of more accurately targeting customers with the best and most relevant offers, for example.Since the Big Data Refinery is great at retaining large volumes of data for long periods of time, the model is completed with the feedback loops illustrated in Steps 4 and 5. Retaining the past 10 years of historical “Black Friday” retail data, for example, can benefit the business, especially if it’s blended with other data sources such as 10 years of weather data accessed from a third party data provider. The point here is that the opportunities for creating value from multi-structured data sources available inside and outside the enterprise are virtually endless if you have a platform that can do it cost effectively and at scale.
So enterprise Hadoop lies at the heart of the next-generation data architecture.Let’s outline what’s required in and around Hadoop in order to make it easy to use and consume by the enterprise.At the center, we start with Apache Hadoop for distributed file storage and processing (a la MapReduce).In order to enable Hadoop within mainstream enterprises, we need to address enterprise concerns such as high availability, disaster recovery, snapshots, security, etc. And on top of this, we need to provide data services that make it easy to move data in and out of the platform, process and transform the data into useful formats, and enable people and other systems to access the data easily.This is where components like Apache Hive, Pig, HBase, HCatalog, and other tools fit.Making it easy for data workers is important, but it’s also important to make the platform easier to operate.Components like Apache Ambari that address provisioning, management and monitoring of the cluster are important here.So all of that: Core and Platform Services, Data Services, and Operational Services all come together into a vision of “enterprise Hadoop”.Ensuring that Enterprise Hadoop Platform can be flexibly deployed across operating systems and virtual environments like Linux, Windows, and Vmware is important.Targeting Cloud environments like Amazon Web Services, Microsoft Azure, Rackspace OpenCloud, and OpenStack is increasingly important.As is the ability to provide enterprise Hadoop pre-configured within a Hardware appliance like Teradata’s Big Analytics Appliance helps pull Hadoo into enterprises as well.
While overly simplistic, this graphic represents what we commonly see as a general data architecture:A set of data sources producing dataA set of data systems to capture and store that data: most typically a mix of RDBMS and data warehousesA set of applications that leverage the data stored in those data systems. These could be package BI applications (Business Objects, Tableau, etc), Enterprise Applications (e.g. SAP) or Custom Applications (e.g. custom web applications), ranging from ad-hoc reporting tools to mission-critical enterprise operations applications.Your environment is undoubtedly more complicated, but conceptually it is likely similar.
As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets).Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with:Existing applications – such as Tableau, SAS, Business Objects, etc,Existing databases and data warehouses for loading data to / from the data warehouseDevelopment tools used for building custom applicationsOperational tools for managing and monitoring
In October 2010, I attended the Hadoop World event in New York City where there was a keynote presentation by Larry Feinsmith of JP Morgan Chase. Larry provided great insight into how JP Morgan Chase was using Hadoop. Great creative use cases! But the point that stuck with me long after the event was the importance of figuring out how Hadoop can and should be integrated with existing IT investments. While Larry said he loves the innovation happening on the open source community, he also said that enterprises like JP Morgan Chase will not throw away all of their existing investments!They want ways that enable them to get the benefits of new technologies in ways that leverage existing skills and integrate with existing systems.
It is for that reason that we focus on HDP interoperability across all of these categories:Data systemsHDP is endorsed and embedded with SQL Server, Teradata and moreBI tools: HDP is certified for use with the packaged applications you already use: from Microsoft, to Tableau, Microstrategy, Business Objects and moreWith Development tools: For .Net developers: Visual studio, used to build more than half the custom applications in the world, certifies with HDP to enable microsoft app developers to build custom apps with HadoopFor Java developers: Spring for Apache Hadoop enables Java developers to quickly and easily build Hadoop based applications with HDPOperational toolsIntegration with System Center, and with Teradata viewpoint
So…if I haven’t made it crystal clear for you yet, maybe this visual will get the point across.Enterprise Hadoop makes a great tag team with you existing tools to enable a next-generation data architecture that positions you to refine and explore vast quantities of multistructured data and enrich your applications and services that drive your business.
So we’ve covered the overall architecture and how Hadoop fits, let’s discuss the patterns of use that we’re seeing for using Hadoop.At a high level, we describe the 3 key patterns of use as Refine, Explore, and Enrich.Refine captures the data into the platform and transforms (or refines it) into the desired formats.Explore is about creating laks of data that you can interactively surf through to find valuable insights.Enrich is about leveraging analytics and models to influence your online applications, making them more intelligent.So while some categorize Hadoop as just a Batch platform, it is increasingly being used and evolving to serve a wide range of usage patterns that span Batch, Interactive, and Online needs.Let me cover these patterns in a little more detail.
Across all of our user base, we have identified just 3 separate usage patterns – sometimes more than one is used in concert during a complex project, but the patterns are distinct nonetheless. These are Refine, Explore and Enrich.The first of these, the Refine case, is probably the most common today. It is about taking very large quantities of data and using Hadoop to distill the information down into a more manageable data set that can then be loaded into a traditional data warehouse for usage with existing tools. This is relatively straightforward and allows an organization to harness a much larger data set for their analytics applications while leveraging their existing data warehousing and analytics tools.Using the graphic here, in step 1 data is pulled from a variety of sources, into the Hadoop platform in step 2, and then in step 3 loaded into a data warehouse for analysis by existing BI tools
A second use case is what we would refer to as Data Exploration – this is the use case in question most commonly when people talk about “Data Science”.In simplest terms, it is about using Hadoop as the primary data store rather than performing the secondary step of moving data into a data warehouse. To support this use case you’ve seen all the BI tool vendor rally to add support for Hadoop – and most commonly HDP – as a peer to the database and in so doing allow for rich analytics on extremely large datasets that would be both unwieldy and also costly in a traditional data warehouse. Hadoop allows for interaction with a much richer dataset and has spawned a whole new generation of analytics tools that rely on Hadoop (HDP) as the data store.To use the graphic, in step 1 data is pulled into HDP, it is stored and processed in Step 2, before being surfaced directly into the analytics tools for the end user in Step 3.
The final use case is called Application Enrichment.This is about incorporating data stored in HDP to enrich an existing application. This could be an on-line application in which we want to surface custom information to a user based on their particular profile. For example: if a user has been searching the web for information on home renovations, in the context of your application you may want to use that knowledge to surface a custom offer for a product that you sell related to that category. Large web companies such as Facebook and others are very sophisticated in the use of this approach.In the diagram, this is about pulling data from disparate sources into HDP in Step 1, storing and processing it in Step 2, and then interacting with it directly from your applications in Step 3, typically in a bi-directional manner (e.g. request data, return data, store response).
When all is said and done, the ultimate goal of big data processing is to optimize outcomes at scale. Geoffrey Moore, author of Crossing the Chasm, gave these good examples across various vertical industries.
And speaking of Geoffrey Moore, let me close out by covering where Hadoop is from a crossing the chasm perspective.Based on our engagement with enterprise customers, we believe Hadoop has transitioned into the early majority and is therefore being used by more mainstream enterprises.Horizontal patterns of use emerge in this stage as well as what Geoffrey Moore calls “bowling pins” or vertical solutions.The net out is that enterprise Hadoop offers exciting promise, but it is still early in it maturity cycle. You can do a lot with the technology, but there’s more to do to harden it for broader mainstream adoptions.
And with that, let me close out with the guiding vision we have at Hortonworks.

Hadoop's Role in Enterprise Architecture

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (12)

Plus de DataWorks Summit

Plus de DataWorks Summit (20)

Dernier

Dernier (20)

Hadoop's Role in Enterprise Architecture

Notes de l'éditeur