1. Hadoop In the Enterprise?
Sih Lee & Peter Krey, Innovation & Shared Services
Firmwide Engineering & Architecture
Hadoop World, New York City, October 2nd, 2009
2009 JPMorgan Chase & Co.
All rights reserved.
Confidential and proprietary to JPMorgan Chase & Co.
2. Agenda
Page
JPMorgan Chase + Open Source 2
Hadoop In The Enterprise? 3
Active POC Pipeline 6
Hadoop Positioning 7
Cost Comparisons 8
Hadoop Additions & Must Haves 10
Hadoop In The Enterprise ?
Q&A 11
1
3. JPMorgan Chase + Open Source
Established Multi-Year Open Source History
Big Supporter of Industry Standards & Open Source Projects
Numerous Production Open Source Implementations
QPID (AMQP) - Top Level Apache Project (http://qpid.apache.org/)
Tyger - Apache + Tomcat + Spring - Fully Integrated App
Server Environment 30+ OS Components
Compute Backbone (CBB) HPC Grid - 1000's of Linux Based Compute
Hadoop In The Enterprise ?
Servers
MuleSoft.org (a.k.a. MuleSource) Enterprise Message Bus
others …
2
4. Hadoop In The Enterprise – Economics Driven
Many Big Data Lessons Learned From Web 2.0 Community
Potential For Large Capex and Opex "Dislocation"
Reduced Consumption of Enterprise Premium Resources
Grid Computing Economics Brought To Data Intensive Computing
Stagnant Data Innovation
Enabling & Potentially Disruptive Platform
Many Historical Similarities
Java, Linux, Tomcat, Web / Internet, …
Hadoop In The Enterprise ?
Mini's to Client / Server, Client / Server to Web, Solaris to Linux, …
Key Question: What Can Be Built On Top of and Enabled by Hadoop?
3
5. Hadoop In The Enterprise – Choice Driven
Overuse of Relational Database Containers
Institutional “Muscle Memory” … Not Much Else to Choose From
Increasing Large Percentage of Static Data Stored In Proprietary
Transactional DB's
Over-Normalized Schemas … Still Makes Sense With Cheap
Compute & Storage?
Enterprise Storage "Prisoners"
Hadoop In The Enterprise ?
Captive To The Economics & Technology of "A Few" Vendors
Developers Need More Choice
Too Much Proprietary, Single-Source Data Infrastructure
Increasing Need For Minimal / No System + Storage Admins
4
6. Hadoop In The Enterprise – Other Drivers
Growing Developer Interest In "No SQL" Data Technologies
Open Source, Distributed, Non-relational Databases
Growing Influence Of Web 2.0 Technologies & Thinking On Enterprise
Hadoop, Cassandra, HBase, Hive, CouchDB, HadoopDB, …, others
memcached For Caching
FSI Industry Drivers
Increased Regulatory Oversight + Reporting =
Hadoop In The Enterprise ?
More Data Needed Over Longer Period Of Time
Growing Need For Less Expensive Data Repository / Store
Increasing Need To Support "One Off" Analysis On Large Data
5
7. Active POC Pipeline
Growing Stream of Real Projects To Gauge Hadoop "Goodness of Fit"
Broad Spectrum of Use Cases
Driven By Need To Impact / Dislocate OPEX + CAPEX
Evaluated On Metric Based Performance, Functional, And
Economic Measures
Hadoop In The Enterprise ?
6
8. Hadoop Positioning
Semi-Structured
Analysis
Higher-Latency
• Map/Reduce + HDFS
• DW7
• DW6
• DW5
• DW3
• SQLDB1 • DW4
GB’s TB’s –> PB’s
Hadoop In The Enterprise ?
• SQLDB2 • DW2
• SQLDB3 • DW1
• InMemory1 • SQLDB4
Index Based Access – Index Based Access –
Updates / XActns Analysis
Lower-Latency
7
9. Comparative Storage Cost Bar Graph Slide
“Normalized" SAN + NAS $ per gb per month versus HDFS $ per gb per month
Hadoop In The Enterprise ?
p
p
p
p
N
N
N
N
N
N
AS
AS
AS
AS
oo
oo
oo
oo
SA
SA
SA
SA
SA
SA
N
N
N
N
ad
ad
ad
ad
H
H
H
H
8
10. Enterprise Data Warehousing Costs
"normalized” bar chart utilizing retail $ per TB
Data Warehouse S/W -- $K per TB
$250
$200
$150
Hadoop In The Enterprise ?
$100
$50
$0
Products
9
11. Hadoop Additions & Must Haves
Improved SQL Front-end Tool Interoperability
Better Interop With Skills & Content That Firms Already Have
Improved Security & ACL enforcement … Kerberos integration?
Grow Developer Programming Model Skill Sets
Improve Relational Container Integration & Interop For Data Archival
Management & Monitoring Tools
Improved Developer & Debugging Tools
Hadoop In The Enterprise ?
Reduce Latency Via Integration With Open Source Data Caching
memcached, others
Invitation To FSI or Enterprise Roundtable
10
12. Q&A
Sih Lee, Head of Innovation & Shared Services
Firmwide Engineering & Architecture
W# 212-622-3038
sih.x.lee@jpmchase.com
Peter Krey, Consultant, Innovation & Shared Services
Firmwide Engineering & Architecture
W# 212-622-2926
peter.j.krey@jpmchase.com
Hadoop In The Enterprise ?
11