Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

2
Agenda
• Data Warehouse Vision & Reality
• What is legacy data & why an Enterprise Data Hub
• Offloading legacy data and workloads to Hadoop
• Transform all types of data into self-service analytics
• Live Demonstration
• Customer case study
• Q&A

3
What is this?
©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.3

4
Real-Time
Mainframe
Oracle
ERP
ETL ETL
Data Mart
Data
Warehouse
File
XML
The Data Warehouse Vision -1998
4
Data Integration & ETL Tools would enable a Single, Consistent Version of the Truth
Data Mart
Data Mart

5
Data Warehouse Reality 2014
5
Real-Time
Mainframe
Oracle
ERP
ETL ETL
Data Mart
File
XML
Data Integration & ETL Tools would enable a Single, Consistent Version of the Truth
Data Mart
Data Mart
Dormant Data
Staging / ELT
New
Reports
SLA’s
New
Column
Complete
History

6
The Data Warehouse Vision vs Reality
Fresher data
Longer history data
Faster analytics
More data sources
Lower costs
Longer ELT batch windows
Shorter data retention
Slower queries
Weeks/months just to add new data fields
Growing costs
Vision Reality

7
Mainframes | A Critical Source of Big Data
7
Top 25
World Banks
9 of World’s
Top Insurers
23 of Top 25 US
Retailers
71%
Fortune 500
30 Billion
Bus. Transactions / day

8
Suits & Hoodies – Working Together
8
Integration
Gaps
Expertise
Gaps
• COBOL appeared in 1959, Hadoop in 2005
• Mainframe & Hadoop skills shortage
Security
Gaps
• Hosts mission critical sensitive data
• Very difficult to install new software on MF
Costs
Gaps
• Mainframe data is (expensive) Big Data
• Even FTP costs CPU cycles (MIPS)
• Connectivity
• Data conversion (EBCDIC vs ASCII)
Suits & Hoodies idea: Merv Adrian, Gartner Research.

9
Expanding Data Requires A New Approach
9
1980s
Bring Data to Compute
Now
Bring Compute to Data
Relative size & complexity
Data
Information-centric
businesses use all data:
Multi-structured,
internal & external data
of all types
Compute
Compute
Compute
Process-centric
businesses use:
• Structured data mainly
• Internal data only
• “Important” data only
Compute
Compute
Compute
Data
Data
Data
Data

10
From Apache Hadoop to an enterprise data
hub
10
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open
Architecture
Secure and
Governed
✖
✖
✖
BATCH
PROCESSING
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECURE
FILESYSTEM
MAPREDUCE
HDFS
Core Apache Hadoop is great, but…
1) Hard to use and manage.
2) Only supports batch processing.
3) Not comprehensively secure.

11
hub
11
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open
Architecture
Secure and
Governed
✔
BATCH
PROCESSING
SYSTEM
MANAGEMENT
FILESYSTEM
MAPREDUCE
HDFS
CLOUDERAMANAGER
✖
✖

12
hub
12
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open
Architecture
Secure and
Governed
✔
✔
BATCH
PROCESSING
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
3RD PARTY
APPS
WORKLOAD MANAGEMENT
SYSTEM
MANAGEMENT
FILESYSTEM ONLINE NOSQL
MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING
YARN
HDFS HBASE
CLOUDERAMANAGER
✖

13
hub
13
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open
Architecture
Secure and
Governed
✔
✔
✔
BATCH
PROCESSING
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
3RD PARTY
APPS
WORKLOAD MANAGEMENT
DATA
MANAGEMENT
SYSTEM
MANAGEMENT
YARN
HDFS HBASE
CLOUDERANAVIGATORCLOUDERAMANAGER
SENTRY

14
hub
14
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed
Open
Architecture
Secure and
Governed
✔
✔
✔
BATCH
PROCESSING
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
3RD PARTY
APPS
WORKLOAD MANAGEMENT
DATA
MANAGEMENT
SYSTEM
MANAGEMENT
CLOUDERA’S ENTERPRISE DATA HUB
YARN
HDFS HBASE
CLOUDERANAVIGATORCLOUDERAMANAGER
SENTRY

15
Partners
Proactive &
Predictive Support
Professional
Services
Training
Cloudera: Your Trusted Advisor for Big Data
15
Advance from Strategy to ROI with Best Practices and Peak Performance

17
The Impact of ELT & Dormant Data on the EDW
17 ©2014Cloudera, Syncsort, Tableau Inc. All rights reserved.
 ELT drives up to 80% of
database capacity
 Dormant – rarely used
data – waste premium
storage
 ETL/ELT processes on
dormant data waste
premium CPU cycles
Hot Warm Cold Data
Transformations (ELT)
of unused data

19
Where to Start?
19
How to identify dormant data?
What workloads will deliver the biggest impact?
How will you access &
move all your data?
Can you secure the new environment?
How do you optimize it?
How do you manage it?
How do you make it business-class?
What tools do you need?
How will you leverage all your data, including mainframes?

2020
Offload Legacy Data & Workloads to The Enterprise Data Hub
Phase III:
Optimize & Secure
Phase II:
Offload
Phase I:
Identify
One Framework. Blazing Performance, Iron-Clad Security, Disruptive Economics
• Identify data & workloads
most suitable for offload
• Focus on those that will
deliver maximum savings &
performance
• Access and move virtually any
data e.g. mainframe to Enterprise
Data Hub with one tool
• Easily replicate existing staging
workloads in Hadoop using a
graphical user interface
• Deploy on premises and in Cloud
• Optimize the new environment
• Manage & secure all your data
with business class tools
• Deliver self-service reporting

22
The Problem: Volume of DataBusinesses are struggling to unlock exploding data

23
The Problem: Diverse DataBusinesses and their people are struggling to unlock diverse data

24
The Problem: Old School
Software
Traditional technologies are complicated, inflexible and slow moving

25
The Tableau RevolutionFast and easy analytics for everyone

26
FlexibleTransform all types of data into self-service analytics

27
For EveryoneEase of use leads to adoption across all departments and use cases

28
•LIVE DEMO

29
Case Study: Optimize EDW Leading Financial Org
29
0
50
100
150
200
250
ElapsedTime(m)
HiveQL
217 min
Syncsort
DMX-h
9 min
HiveQL
217 min
Mainframe Offload
(74-page COBOL
copybook)
Development Effort
Syncsort DMX-h: 4 hrs.
Manual Coding: Weeks!
Benefits:
 Cut development time from weeks to hours
 Reduced complexity 47 HiveQL scripts to 4 DMX-h graphical jobs
 Easily validate COBOL copybooks and find errors
 Mainframe Data available to business for analytics
 Staging & ELT moved out of RDBMS – Queries run faster

3030
Final Thoughts..
Rusty Sears
Vice President of Enterprise Data Services and Big Data at Regions Financial Corporation

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

Similaire à Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight (20)

Plus de Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Dernier

Dernier (20)

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

Notes de l'éditeur