Contenu connexe Similaire à What is the Point of Hadoop (20) Plus de DataWorks Summit (20) What is the Point of Hadoop1. What is the point of Hadoop?
Matthew Aslett
Research Director, 451 Research
© 2013 by The 451 Group. All rights reserved
2. Matthew Aslett
• Research Director, Data Management and Analytics
matthew.aslett@451research.com
www.twitter.com/maslett
Responsible for data management
and analytics research agenda
Focus on operational and analytic
databases, including NoSQL,
NewSQL, and Hadoop
With 451 Research since 2007
© 2013 by The 451 Group. All rights reserved
3. Unique combination of research, analysis & data
Emerging tech market segment focus
Daily qualitative & quantitative insight
Analyst advisory & Go-to-market support
Global events
© 2013 by The 451 Group. All rights reserved
4. Company Overview
One company with 3 operating 200+ staff
divisions 1,300+ client organizations:
Syndicated research, advisory, enterprises, vendors, service
professional services, datacenter providers, and investment firms
certification, and events Organic and growth through
Global focus acquisition
© 2013 by The 451 Group. All rights reserved
5. What is the point of Hadoop?
Hadoop’s greatest asset is its
flexibility: it can be used for
multiple roles and use-cases
But that is also a challenge,
and can lead to confusion
and disillusionment
Each user and vendor has
their own perspective on
Hadoop’s role
© 2013 by The 451 Group. All rights reserved
6. The Blind Men and the Elephant
“It was six men of Indostan
To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind.”
John Godfrey Saxe (1872)
© 2013 by The 451 Group. All rights reserved
7. The Blind Men and the Elephant
“After Hadoop finishes
filtering the data, the place
you want to put that data
is in Oracle Database.”
Larry Ellison (2011)
© 2013 by The 451 Group. All rights reserved
8. Oracle Big Data Appliance
Apache Hadoop
NoSQL Database
Oracle Tools
Oracle Database
Data Integrator for Oracle Database
Data Loader
Big data
Big data
R distribution processing/i
analytics
ntegration
© 2013 by The 451 Group. All rights reserved
9. What is the point of Hadoop?
Big data
Big data Big data
processing/i
storage analytics
ntegration
© 2013 by The 451 Group. All rights reserved
10. Big Data
“Big data” - the realization of greater business intelligence by
storing, processing and analyzing data that was previously ignored due to the
limitations of traditional data management technologies due to the three Vs:
Volume Velocity Variety
The volume of data The data is being The data lacks the
is too large for produced at a rate structure to make it
traditional database that is beyond the suitable for storage
software tools to performance limits and analysis in
cope with of traditional traditional databases
systems and data warehouses
© 2013 by The 451 Group. All rights reserved
11. Total Data
The adoption of non-traditional data processing technologies
is also driven by the user’s particular data processing requirements.
Inspired by ‘Total Football’
– a new approach to soccer
that emerged in the late 1960s,
in Amsterdam
Total Data is making the most
efficient use of existing and
new data management
resources to deliver value
Not another name for Big Data: if your data is big, the way you
manage it should be total
© 2013 by The 451 Group. All rights reserved
12. Big Data and Total Data
Big Data:
The growing volume, velocity
and variety of data
Big Data Technologies:
New technologies being
adopted to store and process
BIG that data
TOTAL
BIG
DATA
DQ
DATA
TECHNOLOGY Total Data:
Volume The user trends driving the
adoption of Big Data
Technologies to store and
Predictive process Big Data and the
analytics management alongside
existing data management
technologies.
© 2013 by The 451 Group. All rights reserved
13. Total Data
The adoption of non-traditional data processing technologies
is also driven by the user’s particular data processing requirements.
Totality
The desire to process
and analyze data in
its entirety, rather
than analyzing a
sample of data and
extrapolating the
results.
© 2013 by The 451 Group. All rights reserved
14. Totality
Big data
Big data
processing/i
storage
ntegration
Prior to adopting Hadoop, only had transactional and
summarized non-transactional data stored in its EDW
The vast majority of its log data was discarded as not valuable
enough to be efficiently processed in an enterprise data warehouse
Now using Hadoop to process hundreds of GBs of log data
produced by the millions of searches and transactions performed
on its site each day
Creating data exports to R, and aggregating data to its existing data
warehouse for analysis
© 2013 by The 451 Group. All rights reserved
15. Total Data
The adoption of non-traditional data processing technologies
is also driven by the user’s particular data processing requirements.
Totality Exploration
The desire to process The interest in
and analyze data in exploratory analytic
its entirety, rather approaches, in which
than analyzing a schema is defined in
sample of data and response to the
extrapolating the nature of the query.
results.
© 2013 by The 451 Group. All rights reserved
17. Exploration
Big data
Big data Big data
processing/i
storage analytics
ntegration
The company wanted to perform analysis on customer
data in order to create geo-targeted advertising
The required data was already present in its data warehouse
but was modeled in a way that would not allow Orbitz to
efficiently process the query
Extracting the data into Hadoop enabled the company to query
it in a way the data warehouse was never designed for
© 2013 by The 451 Group. All rights reserved
18. Hadoop adoption process
Big data
Big data Big data
processing/i
storage analytics
ntegration
Google File System Google MapReduce Google Dremel
Research paper Research paper Research paper
published: 2003 published: 2004 published: 2010
Google Tenzing
Research paper
published: 2011
ANALYTICS
PROCESSING
STORAGE
INNOVATORS EARLY ADOPTERS
Image source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.png
Licensed under the Creative Commons Attribution 2.5 License.
© 2013 by The 451 Group. All rights reserved
19. Crossing the Chasm
Hadoop as (just) a low cost storage option is not fulfilling its potential
Processing and integration is not the complete picture
Hadoop-based analytics unlocks the value of previously ignored data
Attempting to fast forward to analytics, missing out the
processing/integration stage, creates silos and will result in disillusionment
PROCESSING
ANALYTICS
STORAGE
EARLY
INNOVATORS ADOPTERS EARLY MAJORITY LATE MAJORITY LAGGARDS
Image source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.png
Licensed under the Creative Commons Attribution 2.5 License.
© 2013 by The 451 Group. All rights reserved
20. Total Data
The adoption of non-traditional data processing technologies
is also driven by the user’s particular data processing requirements.
Totality Exploration Frequency
The desire to process The interest in The desire to
and analyze data in exploratory analytic increase the rate of
its entirety, rather approaches, in which analysis in order to
than analyzing a schema is defined in generate more
sample of data and response to the accurate and timely
extrapolating the nature of the query. business intelligence.
results.
© 2013 by The 451 Group. All rights reserved
21. Frequency
Formerly AT&T Advertising solutions and AT&T Interactive
Faced with increasing volume of traffic through
distribution network
Wanted to provide intra-day reporting, but faced days of
report-lag due to loading multiple databases
Moved data processing to Hadoop, enabling the creation
of a single common data layer for all applications
Report-lag reduced to hours, rather than days
New insight enabled by more frequent analysis and being able to
process all the data
© 2013 by The 451 Group. All rights reserved
22. Total Data
The adoption of non-traditional data processing technologies
is also driven by the user’s particular data processing requirements.
Totality Exploration Frequency Dependency
The desire to process The interest in The desire to The reliance on
and analyze data in exploratory analytic increase the rate of existing technologies
its entirety, rather approaches, in which analysis in order to and skills, and the
than analyzing a schema is defined in generate more need to balance
sample of data and response to the accurate and timely investment in those
extrapolating the nature of the query. business intelligence. existing technologies
results. and skills with the
adoption of new
techniques.
© 2013 by The 451 Group. All rights reserved
23. SQL meets Hadoop
RDBMS and Hadoop
SQL on Hadoop Operational SQL on Hadoop
co-processing
• Hive • Hadapt Adaptive Analytic • Drawn to Scale
• Project Stinger Platform • Spire
• Apache Tez (proposed)
• Teradata Aster SQL-H • Splice Machine
• Impala • Splice SQL Engine
• Cloudera Enterprise RTQ • Rainstor Big Data Analytics
on Hadoop
• Apache Drill
• (incubating) • EMC Greenplum HAWQ
• Phoenix project • Microsoft PolyBase
• For HBase
• Citus Data CitusDB
• Lingual
• For Cascading and • IBM Big SQL
Hadoop
© 2013 by The 451 Group. All rights reserved
24. Crossing the Chasm
Project maturity
Vendor ecosystem
Mainstream interest
Geographic adoption
PROCESSING
ANALYTICS
STORAGE
EARLY
INNOVATORS ADOPTERS EARLY MAJORITY LATE MAJORITY LAGGARDS
Image source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.png
Licensed under the Creative Commons Attribution 2.5 License.
© 2013 by The 451 Group. All rights reserved
25. Project maturity
Feb 2006 Dec 2012
© 2013 by The 451 Group. All rights reserved
26. Vendor ecosystem
70+ different 120+ different
companies, 200+ companies, 750+
individuals individuals
Hortonworks Hortonworks
The rest 37% The rest 27%
29% 31%
HADOOP ALL
CORE HADOOP
PROJECTS
Facebook
7% Cloudera
Facebook
15%
Yahoo! 11%
Cloudera Yahoo
12%
15% 16%
Contributors by lines of
code by current employer
© 2013 by The 451 Group. All rights reserved
27. Vendor ecosystem
Academia
Unknown/indi 1%
viduals
4%
Users ALL Hadoop
38% HADOOP vendors
PROJECTS 51%
Contributors by lines of
Other vendors code by current employer
6% and contributor type
© 2013 by The 451 Group. All rights reserved
30. Largest employers of Hadoop skills
Yahoo
Microsoft
Google
Current employer
eBay
Amazon
IBM
LinkedIn
Oracle
EMC
Cisco
Cloudera
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
% of total LinkedIn profiles mentioning Hadoop
Source: LinkedIn: August 2012
© 2013 by The 451 Group. All rights reserved
31. Largest employers of Hadoop skills
Yahoo
Microsoft
Google
Current employer
Amazon
IBM
eBay
Oracle
LinkedIn
Tata
HP
Cisco
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
% of total LinkedIn profiles mentioning Hadoop
Source: LinkedIn: February 2013
© 2013 by The 451 Group. All rights reserved
32. Geographic adoption
Seattle UK
3.7% 3.0%
NYC
4.8%
LA DC
3.0%
3.5%
China
3.6%
India
9.7%
Bay area
28.2%
LinkedIn search result
December 2011
© 2013 by The 451 Group. All rights reserved
33. Geographic adoption
Seattle UK
3.9% NYC 3.4%
4.7%
LA DC
2.8% 3.1%
China
4.4%
India
11.2%
Bay area
24.9%
LinkedIn search result
August, 2012
© 2013 by The 451 Group. All rights reserved
34. Geographic adoption
Seattle UK
3.9% NYC 3.4%
4.6%
LA DC
2.7% 3.1%
China
4.8%
India
13.5%
Bay area
22.9%
LinkedIn search result
February 2013
© 2013 by The 451 Group. All rights reserved
35. Geographic adoption
USA ROW
40000 Total: 38,049
35000
30000 41.7%
25000
Total: 22,178
20000
39.6%
15000
Total: 9,079 58.3%
10000
35.6% 60.4%
5000
64.4%
0
December 2011 August 2012 February 2013
LinkedIn search result
© 2013 by The 451 Group. All rights reserved
36. Conclusions
Hadoop’s greatest asset is its flexibility, but that is also a challenge,
and can lead to confusion and disillusionment among later adopters
Hadoop is enabling greater business intelligence by storing, processing and
analyzing data that was previously ignored due to the limitations of
traditional data management technologies
Storage, processing, and analyzing of data is a process that has enabled
early adopters to understand Hadoop’s role in the wider landscape
Attempting to fast forward to analytics, missing out the
processing/integration stage, creates silos and will result in disillusionment
The Hadoop ecosystem is vibrant, with strength in depth, and breadth
Growing mainstream interest and geographic adoption means Hadoop is
well-positioned to cross the chasm into mainstream adoption
© 2013 by The 451 Group. All rights reserved