Contenu connexe Similaire à Hadoop Overview (20) Hadoop Overview 1. HADOOP
OVERVIEW
Milind Bhandarkar,
Chief Architect, CTO Office, Greenplum
Will Davis
Senior Manager, Product Marketing, Greenplum
© Copyright 2012 EMC Corporation. All rights reserved. 1
2. Agenda
Hadoop – what’s the big deal?
Evolution of Hadoop from Web 2.0 to
Enterprise adoption
Deployment considerations for Enterprises
– Enterprise storage
– Integration into architecture and analytics
workflow
– Training/support resources
How Greenplum HD is Hadoop built for the
enterprise
© Copyright 2012 EMC Corporation. All rights reserved. 2
4. What is Hadoop
Framework that allows for distributed
processing of large data sets across
clusters of commodity servers
– Store large amount of data
– Process the large amount of data
stored
Inspired by Google’s MapReduce and
Google File System (GFS) papers
Apache Open Source Project
– Initial work done at Yahoo!
– Very active open source community
© Copyright 2012 EMC Corporation. All rights reserved. 4
5. The Hadoop Opportunity
Internet age + exploding data growth
Enterprises increasingly interested in
leveraging new data sources quickly:
– Spot emerging trends
– Identify new opportunities, etc.
Traditional database tools not able to cope
– Weren’t built for big data use cases
– Lack scale, not cost-effective, rigid data structure
Need for new approach Hadoop
© Copyright 2012 EMC Corporation. All rights reserved. EMC Confidential 5
6. Why Hadoop is Important?
Handles large amounts of data
Stores data in native format
Delivers linear scalability at low cost
Resilient in case of infrastructure failures
Transparent application scalability
© Copyright 2012 EMC Corporation. All rights reserved. 6
7. Why Hadoop is Important?
Handles large amounts of data
Stores data in native format
Delivers linear scalability at low cost
Resilient in case of infrastructure failures
Transparent application scalability
Enterprises can gain a competitive
advantage through the adoption of
big data analytics
© Copyright 2012 EMC Corporation. All rights reserved. 7
8. What is Hadoop?
Two Core Components
HDFS MapReduce
Scalable storage in Compute via the
Hadoop Distribued MapReduce distributed
File System processing platform
• Storage & Compute in 1 Framework
• Open Source Project of the Apache Software
Foundation
• Java-intensive programming required
© Copyright 2012 EMC Corporation. All rights reserved. 8
9. Hadoop Architecture
1. Data is ingested into the Hadoop File System (HDFS)
2. Computation occurs inside Hadoop (MapReduce)
3. Results are exported from HDFS for use
Hadoop Data Node Hadoop Data Node Hadoop Data Node
Hadoop Data
Ethernet
Node
Hadoop Data Node Hadoop Data Node Hadoop Data Node
© Copyright 2012 EMC Corporation. All rights reserved. 9
10. Hadoop Components
Spring Hadoop •Integrates Spring and Hadoop Frameworks
Mahout •Scalable machine learning libraries
HBase •Database for random, real time read/write access
Hive •System for SQL-like query data on top of HDFS
Pig •Procedural language that abstracts MapReduce
Zookeeper •Highly reliable distributed coordination
MapReduce •Framework for writing scalable data applications
HDFS •Hadoop Distributed File System
© Copyright 2012 EMC Corporation. All rights reserved. 10
11. Hadoop Use Case Examples
Scale-out content Personalization and
management & data asset management
repository analysis
Batch processing of Trade analytics
heterogeneous data ETL Credit scoring
(Extract/Transform/Load
) Customer retention
Pre-processing and Sentiment analysis
integration with data (opinion mining)
warehouse
© Copyright 2012 EMC Corporation. All rights reserved. 12
12. Evolution of Hadoop
From Web 2.0 to
Enterprise
© Copyright 2012 EMC Corporation. All rights reserved. 13
13. Web 2.0 Organizations are
“Data-Driven”
“The future is here, it’s just not evenly distributed yet.”
–WILLIAM GIBSON
© Copyright 2012 EMC Corporation. All rights reserved. 14
14. Technology Adoption Lifecycle
Innovators/ Early Majority Late Majority Laggards
Early Adopters
© Copyright 2012 EMC Corporation. All rights reserved. 15
15. Evolution of the Hadoop Market
Innovators/ Early Majority Late Majority Laggards
Early Adopters
Hadoop Early Adopters Hadoop Early Majority
© Copyright 2012 EMC Corporation. All rights reserved. 16
16. Evolution of the Hadoop Market
HADOOP PROFILE (TODAY)
Pioneers and academics
Application Architect
Visionary
Open source / community driven
Build-your-own server, application
& storage infrastructure
Commodity components
Web 2.0
Universities
Life Sciences
Hadoop Early Adopters Hadoop Early Majority
© Copyright 2012 EMC Corporation. All rights reserved. 17
17. Evolution of the Hadoop Market
HADOOP PROFILE (TODAY) HADOOP PROFILE (FUTURE)
Pioneers and academics IT Manager & CIO
Application Architect Data Scientist
Visionary Line-of-business
Open source / community driven Commercial distribution
Build-your-own server, application Turnkey solution
& storage infrastructure
End-to-End Data protection
Commodity components
Web 2.0 Fortune 1000
Universities Financial Services
Life Sciences Retail
Hadoop Early Adopters Hadoop Early Majority
© Copyright 2012 EMC Corporation. All rights reserved. 18
19. Greenplum HD:
Hadoop for the Enterprise
© Copyright 2012 EMC Corporation. All rights reserved. 20
20. Hadoop Challenges in the Enterprise
Hadoop is hard right now!
– Setup & configuration is resource-intensive
– Lack of skills to make Hadoop work
– Poor integration with existing technologies
– Management at Scale is nonexistent
– Backup & disaster recovery missing
© Copyright 2012 EMC Corporation. All rights reserved. 21
21. Greenplum HD Enterprise-Ready Hadoop
Simple, efficient and scalable
Proven at scale with worldwide
EMC support
Purpose-built Hadoop
infrastructure
Services to address the talent gap
Parallel analytics access with
Greenplum Database
© Copyright 2012 EMC Corporation. All rights reserved. 22
22. Greenplum HD Architecture
Greenplum Chorus
GREENPLUM COMMAND CENTER
Hadoop Tools (Pig, Hive, HBase, Zookeeper, Mahout,
etc…)
MapReduce Layer
Pluggable Storage Layer (HDFS API)
Apache HDFS Isilon OneFS
© Copyright 2012 EMC Corporation. All rights reserved. 23
23. Enterprise Storage for Hadoop
Integrated big data storage and analytic
solution based on Greenplum HD and
Isilon scale-out NAS
Compute
Isilon is 1st and only enterprise scale out
NAS storage platform that natively
integrates the Hadoop Distributed File
System (HDFS) protocol
Seamless analytics access with
Storage
Greenplum - Hadoop insights directly
plug into Greenplum Database to
augment analytics
© Copyright 2012 EMC Corporation. All rights reserved. EMC Confidential 24
24. Flexible and Efficient
Independently Scale Compute & Storage
– Add Greenplum HD or Isilon nodes for performance or
capacity
Eliminate 3x copies of data in HDFS
– Isilon enables 80% utilization for greater storage efficiency
Seamless Analytics Access with Greenplum Database
– Hadoop Fused with GPDB for Big Data analytics
© Copyright 2012 EMC Corporation. All rights reserved. 25
25. Simplified Deployment
Remove the need for data staging
– Isilon enables data access over
standard protocols (NFS, CIFS, FTP,
HTTP, HDFS)
No single point of failure
– Isilon distributes the NameNode to
provide high availability and load
balancing
Enterprise data services for Hadoop
– Advanced backup and disaster
recovery capabilities
© Copyright 2012 EMC Corporation. All rights reserved. 26
26. Advanced Management
Greenplum Command Center
– Complete platform management and control
Greenplum Package Manager
– Automates install, uninstall, update, and query for analytics
extensions
– Support package migration during upgrade, segment recovery,
expansion, and standby initialization
© Copyright 2012 EMC Corporation. All rights reserved. 27
27. Proven at Scale with Worldwide Support
Industries largest Hadoop
support team
– Industry’s most accomplished
Hadoop talents (from Yahoo!,
LinkedIn, Talend, etc.)
Tested at scale on the
Greenplum Analytics Workbench
– 1,000-node, 24-petabyte cluster
– Multi-million dollar investment by
Bringing Rapid EMC and partners
Innovation to Hadoop – Reduced risk for EMC customers
– Certification of partner products
© Copyright 2012 EMC Corporation. All rights reserved. 28
28. Get Started With Hadoop Today
Hadoop Architecture Services
– POC planning and deployment
– Installation and best practices
– Educate the team
Greenplum Analytics Labs
– Leverage the expertise of Greenplum’s
Data Scientists
– Packaged solutions that produce business
value and actionable results
– Accelerate Hadoop capabilities on your
data with your analysts
Establish a strategic vision
– Roadmap for Hadoop and unified analytics
© Copyright 2012 EMC Corporation. All rights reserved. 29
29. Provide Feedback & Win!
125 attendees will receive
$100 iTunes gift cards. To
enter the raffle, simply
complete:
– 5 sessions surveys
– The conference survey
Download the EMC World
Conference App to learn
more: emcworld.com/app
© Copyright 2012 EMC Corporation. All rights reserved. 30