More Related Content Similar to Ventana Research Presents: Best Practices with Hadoop - Real World Data (20) More from Cloudera, Inc. (20) Ventana Research Presents: Best Practices with Hadoop - Real World Data1. David Menninger of Ventana Research Presents:
Best Practices with Hadoop - Real World Data
Audio/Telephone: +1 (909) 259-0012
Access Code: 622-064-673
Audio PIN: Shown after joining the Webinar
Hosts: Rich Guth, CMO, Karmasphere
Charles Zedlewski, VP Product, Cloudera
1
2. Housekeeping
• Ask questions at any time using the Questions panel
• Twitter: #HadoopTrends
• Problems? Use the Chat panel
• Slides and recording will be available
2
3. Speaker: David Menninger
Vice President , Ventana Research
• Covers analytics, business intelligence and information
management for Ventana Research. David brings over
two decades of experience, through which he has
marketed and brought to market some of the leading
edge technologies for helping organizations analyze
data to support a range of action-taking and decision-
making processes.
• Prior to joining Ventana Research, David was VP of
Marketing and Product Management at Vertica Systems,
Oracle, Applix, InforSense and IRI Software. He helped
create over half a billion dollars of shareholder value
while serving in these roles.
• Email: david.menninger@ventanaresearch.com
• Twitter: @dmenningervr
3
4. Who We Are
Mission: To help organizations to profit from all of their data
How We Do It Credentials Technical Team Leadership
We deliver relevant The Apache Hadoop Unmatched knowledge Strong executive team
products and services. experts. and experience. with proven abilities.
Mike Olson Jeff
A distribution of Apache Hadoop Number 1 commercial , open Founders, committers and CEO Hammerbacher
that is tested, certified and source distribution of Apache contributors to Hadoop Chief Scientist
Kirk Dunn
supported Hadoop
A wealth of experience in the COO Amr Awadalla
Comprehensive support and Largest contributor to the open design and delivery of production Charles VP Engineering
professional service offerings source Hadoop ecosystem software Zedlewski
Doug Cutting
VP, Product
A suite of management software Breadth and depth in a team of Mary
Chief Architect
for Hadoop operations open source committers and Omer Trajman
Rorabaugh
contributors VP, Customer
Training and certification CFO
Solutions
programs for developers, More than 100 customers across
administrators, managers and a wide variety of industries
data scientists
Strong growth in revenue and
new accounts
4 ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
Reproduction or redistribution without written permission is
prohibited.
5. What we do
Consulting Services
Cloudera University Cloudera Partners
OPERATORS ENGINEERS ANALYSTS BUSINESS USERS
Cloudera Enterprise
Management Cloudera Management Suite Enterprise
Cloudera Support IDE’s BI / Analytics
Tools Reporting
CUSTOMERS
Adapters
Cloudera’s Distribution Enterprise Data
Including Apache Hadoop (CDH) Warehouse
Web +
Application SCM Express
Relational
Logs Files Web Data
Databases
BIG DATA
5
©2011 Cloudera, Inc. All Rights Reserved.
6. Karmasphere
Opening Up the Data in Hadoop for the Enterprise
6 © Karmasphere 2011 All rights reserved
7. Karmasphere Big Data Intelligence Product Suite
For Data and Business Analysts
Graphical environment where Big Data on Hadoop – even unstructured data –
can be accessed, discovered, and analyzed via familiar SQL and visualized in
Excel and other visualization tools
FREE for Developers New to Hadoop
Graphical development environment that facilitates learning how to
prototype, develop and test MapReduce jobs for Hadoop
For Developers Going into Production
Graphical development environment for the complete Hadoop application
development lifecycle, adding debugging, packaging and profiling to the
capabilities of Community Edition
7 © Karmasphere 2011 All rights reserved
8. Hadoop and Information Management
Benchmark Research Project
Preliminary Findings
June 23, 2011
8
©2011, Ventana Research, Inc.
9. Agenda
Why did we undertake this research?
What is did our research examine?
What did we find?
How should you use this information?
Where do you get more information?
©2011, Ventana Research, Inc.
9
10. Ventana Research – Overview
Ventana Research is the leading benchmark research and strategic advisory
services firm. Our unparalleled analytic insights and best practices guidance
and are based on our rigorous research-based benchmarking, business,
technology and best practices services.
Unique Combination of Capabilities
• Members (85,000) and Reach to Professionals (3milion)
• Research and Reach across all line of business functions
and IT
• Expertise Across Business • Conduct and Deliver Benchmark
and Technology Research
• Understand Business • Develop Analytic and Best
Domain and Processes Practice Assessments
• Formalized Research Coverage of Technology Vendors
• Deliver Research on Technology Impact to Business
©2011, Ventana Research, Inc.
10
13. Research Objectives
Gauge both the adoption rate and intentions to use Hadoop
Determine which elements of the Hadoop ecosystem are the
most popular
• Including which distributions, which components and which third-
party products.
Examine the infrastructures and strategies being used to
deploy Hadoop
Clarify the role of the cloud in enterprise Hadoop deployments
Elucidate the components of the business case for Hadoop
Detail use of Hadoop across industries
Determine the barriers and obstacles to further adoption of
Hadoop
©2008, Ventana Research, Inc.
14. Respondent Demographics
Participation by Region Company Size by Employee
Central
and count
Middle South
America Africa
East Small
3% 2%
3% 14%
Europe
Very
7%
Large
35%
Asia
Pacific
16% Midsize
24%
North
America
69%
Large
27%
Total qualified responses: 163
©2011, Ventana Research, Inc.
14
15. Touching Over Half The Big Data Audience
Hadoop Usage Currently in
production
22%
No plans to use
46% Plan to use
54% within 12
months
12%
Plan to use in
12-24 months
3%
Still evaluating
17%
©2011, Ventana Research, Inc.
15
16. Hadoop Is Generally Additive
Is your Hadoop deployment replacing another technology?
Hadoop is supplementing
Yes other established
37% technologies, with RDBMSs
still the dominant technology
being used or planned to be
No used by more than nine out
63% of ten organizations.
©2011, Ventana Research, Inc.
16
17. Hadoop Is Additive In More Than One Way
Are there things you're able to do or plan to do with
large-scale data technologies that you couldn't do
before deployment?
87%
52%
Hadoop Other
©2011, Ventana Research, Inc.
17
18. Hadoop Is Additive In More Than One Way
What are you able to do or what do you plan to do with
large-scale data technologies that you couldn't do
before deployment?
94%
Analyze data at a
greater level of detail
93%
Perform types of analytics
88%
that couldn't be done on
large volumes of data
71%
before
88%
Keep more historical
data (post-process)
60%
Capture all of the source
82%
data that we are
collecting Hadoop
47%
(pre-process) Non-Hadoop
©2011, Ventana Research, Inc.
18
19. What Types of Data?
Hadoop is much more likely to be used for log and event data;
much less likely to be used for transaction data. It’s also more
likely to be used for text and multimedia.
Most Common - Hadoop Most Common - Others
• Application logs • Customer/member data
• Other types of event data • Transaction data
• Other log files • Application logs
• Web logs • Online retail transactions
• Transaction data • Network monitoring/traffic
• Network monitoring/traffic • Call detail records
What types of large-scale data does your organization analyze?
©2011, Ventana Research, Inc.
19
20. What Types of Data?
Q28 What types of large-scale data does your organization analyze?
59%
Customer/member data 68%
44%
Transactional data from applications (for… 68%
69%
Application logs 37%
64%
Other types of event data 23%
41%
Network monitoring/network traffic 33%
33%
Online retail transactions 34%
51%
Other log files 26%
28%
Call Detail Records 32%
Web logs 46%
21%
36%
Text data from social media and online… 15%
36%
Search logs 11%
18%
Trade/quote data 15%
Intelligence/defense data 18%
11%
21%
Multimedia (audio/video/images) 9%
8%
Weather 3%
3% Hadoop
Smartmeter data 6%
3%
Non Hadoop
Other (please specify) 5%
©2011, Ventana Research, Inc.
20
21. What Types of Applications?
What types of large-scale data applications is your
organization running?
60%
Query and reporting
89%
Consolidation of multiple 63% Hadoop is most often
data sources for analysis 71% used for advanced
Custom/production 65% analyses and is more
application 68% likely to be used to
56% analyze unstructured
Data preparation
60% data and for data
69% sandboxing than other
Advanced analyses
47% technologies. It is less
Analysis or indexing 46%
likely to be used for
of unstructured data 32% query and reporting.
Hadoop
Data sandbox/ 44%
Data experimentation 32% Non-Hadoop
©2011, Ventana Research, Inc.
21
22. Where Sourced?
From which source(s) did you access Hadoop software?
Apache 63%
Cloudera 55%
Amazon 11% The Apache Hadoop
distribution, most prevalent
IBM 8% followed closely by
Cloudera. Nearly half the
Yahoo 8%
organizations are using
Facebook 5% more than one distribution.
Other (please
5%
specify)
Don't know 5%
©2011, Ventana Research, Inc.
22
23. Which Components?
WhichDistributed File System…
Hadoop Hadoop-related projects do you use of plan 79%
to
use? MapReduce 76%
Hbase 61%
Hive 53%
Zookeeper 45%
Pig 45%
Flume 34%
Sqoop 26%
Oozie 18%
Avro 16%
Don't know 11%
©2011, Ventana Research, Inc.
23
24. Hadoop Organizations are More Confident
How confident are you in your organization's ability to
manage large-scale data?
Hadoop 43% 37% 18% 2%
Non Hadoop 23% 32% 35% 11%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Very confident Confident Somewhat confident Not very confident
©2011, Ventana Research, Inc.
24
25. Report Higher Levels of Benefits
Q27 What are the primary benefits of using your current technologies for
analyzing large-scale data sets?
79%
Allow us to retain and analyze more data 71%
85%
Increase the speed of analysis 63%
51%
Produce more accurate results 66%
64%
Reduce or eliminate manual processes 56%
62%
Cost savings - reduced implementation time/fees 53%
Reduce the time required for data collection and 67%
preparation 49%
Higher customer retention from better analysis 54%
of customer data 54%
72%
Utilize computing resources more efficiently 46%
82%
Cost savings - license fees 40%
49%
Reduce effort/staff required 49%
67%
We are able to create new products/services 32%
Improved margins resulting from better 41%
algorithms 30% Hadoop
26% Non-Hadoop
Improved clickthrough, cross-selling or upselling 30%
©2011, Ventana Research, Inc.
25
26. Research Can Help Answer Your Questions
Is Hadoop a fad or here to stay?
Which distributions/components are being used?
Apache?
Cloudera?
Other?
Are your peers using Hadoop and for what purpose?
Identify and avoid some of the obstacles to successful
deployments.
27. What Should You Do?
Already using Hadoop?
Compare you usage with others
Are you using all the components you should be?
Have you considered all application areas?
Is your usage tactical (cost saving) or strategic (new
capabilities)?
Not Using or Evaluating Hadoop?
Consider whether you should be
Did your organization need some “proof”?
©2011, Ventana Research, Inc.
27
28. Where to Get More Information
Free webinar and report: Contact us with questions:
Ventana Research will host
a webinar with the final
results and analysis.
Report of our findings will
be distributed by the
sponsors and will be
available on our website:
Ventana Research
www.VentanaResearch.com/HIM 925-474-0060
info@ventanaresearch.com
www.ventanaresearch.com
©2011, Ventana Research, Inc.
28
29. Q&A
Ask questions using the Questions panel
Tweet
• #HadoopTrends
• @dmenningervr
• @Cloudera
• @Karmasphere
Thank you for participating!
29