In spite of recent advances in computing, many core business processes are batch-oriented running on Mainframes. Annual Mainframe costs are counted in 6+ figure Dollars per year, potentially growing with capacity needs. In order to tackle the cost challenge, many organizations have considered or attempted multi-year mainframe migration/re-hosting strategies. Traditional approaches to Mainframe elimination call for large initial investments and carry significant risks – It is hard to match Mainframe performance and reliability. Using Hadoop, Sears/MetaScale developed an innovative alternative that enables batch processing migration to Hadoop, without the risks, time and costs of other methods. This solution has been adopted in multiple businesses with excellent results and associated cost savings, as Mainframes are physically eliminated or downsized: Millions of dollars in savings based on MIP reductions have been seen – A reduction of 200 MIPS can yield $1 million in annual savings. MetaScale eliminated over 900 MIPs and an entire Mainframe system for one fortune 500 client. This presentation illustrates reference architecture and approach successfully used by MetaScale to move mainframe processing to the Hadoop platform without altering user-facing business applications.
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy Modernization
1. 1
Hadoop Summit 2013- June 26th, 2013
Move to Hadoop, Go Faster and Save Millions
- Mainframe Legacy Modernization
Sunilkumar Kakade – Director IT
Aashish Chandra – DVP, Legacy Modernization
2. 2
Legacy Rides The Elephant
Hadoop is disrupting the enterprise IT processing.
3. 3
Recognition - Contributors
• Our Leaders
• Ted Rudman
• Aashish Chandra
• Team
• Simon Thomas
• Sunil Kakade
• Susan Hsu
• Bob Pult
• Kim Havens
• Murali Nandula
• Willa Tao
• Arlene Pynadath
• Nagamani Banda
• Tushar Tanna
• Kesavan Srinivasan
5. 5
Mainframe Migration - Overview
• In spite of recent advances in computing, many core business
processes are batch-oriented running on mainframes.
• Annual Mainframe costs are counted in 6+ figure Dollars per
year, potentially growing with capacity needs. In order to tackle
the cost challenge, many organization have considered or
attempted multi-year mainframe migration/re-hosting
strategies.
6. 6
Batch Processing Characteristics
*Ref:. IBM Redbook
Characteristics*
•Large amounts of input data are processed and stored (perhaps
terabytes or more).
•Large numbers of records are accessed, and a large volume of
output is produced
•Immediate response time is usually not a
requirement, however, must complete within a “batch window”
•Batch jobs are often designed to run concurrently with online
transactions with minimal resource contention.
7. 7
Batch Processing Characteristics
Key infrastructure requirements:
•Sufficient data storage
•Available processor capacity, or cycles
•job scheduling
•Programming utilities to process basic operations
(Sort/Filter/Split/Copy/Unload etc.)
8. 8
Why Hadoop and Why Now?
THE ADVANTAGES:
• Cost reduction
• Alleviate performance bottlenecks
• ETL too expensive and complex
• Mainframe and Data Warehouse processing Hadoop
THE CHALLENGE:
• Traditional enterprises lack of awareness
THE SOLUTION:
• Leverage the growing support system for Hadoop
• Make Hadoop the data hub in the Enterprise
• Use Hadoop for processing batch and analytic jobs
9. 9
The Architecture
• Enterprise solutions using Hadoop must be an eco-system
• Large companies have a complex environment:
• Transactional system
• Services
• EDW and Data marts
• Reporting tools and needs
• We needed to build an entire solution
11. 11
Hadoop based Ecosystem for Legacy System Modernization
Mysql
Hbase
Hadoop
price
LEGACY -
TERADATA/DB2
product
SOLR
S
ales
C
ustom
er
Enterprise
Systems
JQUERY/AJAX
Quart
z
JAXB
REST API
JDBC/IBATIS
JBOSSJ2EE/JBOSS/SPRING
Batch Processing
HIVE
RUBY/MAPREDUCE
JBOSSHADOOP/PIG
DB2
Oracle
UDB price
Teradataproduct
MySQL
S
ales
C
ustom
erEnterprise
Systems
JQUERY/AJAX
Quart
z
JAXB
REST API
JDBC/IBATIS
JBOSSJ2EE/WebSphere
Mainframe Batch Processing
VSAM
JBOSSCOBOL/JCL
MetaScale
12. 12
Mainframe Batch Processing Architecture
Mainframe Batch Processing Architecture
User Interface Data Sources
Batch
Processing
Datawarehouse
Input
Resultant
Data
Resultant
Data
Historical
Data Sources
Input
Data
Retention
External Systems
Resultant
Data
Input
13. 13
MetaScale Batch Processing Architecture With Hadoop
Hadoop EcoSystem
User Interface Data Sources
Hadoop EcoSystem
Map Reduce based
Batch Processing
External
Systems/
Datawarehouse
Input
Move to Hadoop
Resultant
Data
Move to Non-Hadoop
Resultant Data
Move to Non-Hadoop platform
Datawarehouse
Resultant Data
14. 14
Typical Batch Processing Units (JCL) on Mainframe
Batch Processing - JOB FLOW
JCL1 - APPLICATION 1
Mainframe Batch Processing Flow
User Interface Data Sources
Batch
Processing
External
Systems/
Datawarehouse
Input
Resultant Data Resultant Data
SORT Input SPLIT
Input
SORT
Input COBOL
Input FILTER
Input FORMAT
JCL2 - APPLICATION 1
JCL3 - APPLICATION 2
LOAD TO DATABASE
COPY Input COBOL Input FORMAT
Input
Input
15. 15
Batch Processing Migration With Hadoop
Seamless migration of high MIPS processing jobs with no application alteration
Commodity Hardware Based Software Framework
Batch Processing - JOB FLOW
Batch Process - APPLICATION 1
Batch Processing - JOB FLOW - Legacy Platform
Invention - Migration methodology for Legacy Applications to Commodity Hardware
User Interface Data Sources
External
Systems/
Datawarehouse
Batch
Processing
Input Resultant Data
PIG/MR Input PIG/MR
Input
PIG/MR
Input PIG/MR
Input PIG/MR
Input PIG/MR
JCL2 - APPLICATION 1
JCL3 - APPLICATION 2
LOAD TO DATABASE
COPY Input COBOL Input FORMAT
Input
Input
Resultant Data
16. 16
Mainframe to Hadoop-PIG conversion example
Mainframe JCL
//PZHDC110 EXEC PGM=SORT
//SORTIN DD DSN=PZ.THDC100.PLMP.PRC,
// DISP=(OLD,DELETE,KEEP)
//SORTOUT DD
DSN=PZ.THDC110.PLMP.PRC.SRT,LABEL=EXPDT=99000,
// DISP=(,CATLG,DELETE),
// UNIT=CART,
// VOL=(,RETAIN),
// RECFM=FB,LRECL=40
//SYSIN DD DSN=KMC.PZ.PARMLIB(PZHDC11A),
// DISP=SHR
//SYSOUT DD SYSOUT=V
//SYSUDUMP DD SYSOUT=D
//*__________________________________________________
//* SORT FIELDS=(1,9,CH,A)
- 500 Million Records sort took 45 minutes of clock time
on A168 mainframe
PIG
a = LOAD 'data' AS f1:char;
b = ORDER a BY f1;
- 500 Million Records sort took
less than 2 minutes
More benchmarking studies in
progress
17. 17
Mainframe to Hadoop-PIG conversion example
Mainframe JCL
//PZHDC110 EXEC PGM=SORT
//SORTIN DD DSN=PZ.THDC100.PLMP.PRC,
// DISP=(OLD,DELETE,KEEP)
//SORTOUT DD
DSN=PZ.THDC110.PLMP.PRC.SRT,LABEL=EXPDT=99000,
// DISP=(,CATLG,DELETE),
// UNIT=CART,
// VOL=(,RETAIN),
// RECFM=FB,LRECL=40
//SYSIN DD DSN=KMC.PZ.PARMLIB(PZHDC11A),
// DISP=SHR
//SYSOUT DD SYSOUT=V
//SYSUDUMP DD SYSOUT=D
//*__________________________________________________
//* SORT FIELDS=(1,9,CH,A)
- 500 Million Records sort took 45 minutes of clock time
on A168 mainframe
PIG
a = LOAD 'data' AS f1:char;
b = ORDER a BY f1;
- 500 Million Records sort took
less than 2 minutes
More benchmarking studies in
progress
18. 18
Mainframe Migration – Value Proposition
Mainframe
Migration
Optimize
PiG /
Hadoop
Rewrites
Convert
High
TCO
Resource
Crunch
Inert
Business
Practices
Mainframe ONLINE
-Tool based
Conversion
-Convert COBOL &
JCL to Java
Mainframe Optimization:
-5% ~ 10% MIPS
Reduction
-Quick Wins with Low
hanging fruits
Mainframe BATCH
-ETL Modernization
-Move Batch Processing
to Hadoop
Cost Savings
Open Source
Platform
Simpler &
Easier Code
Business
Agility
Business & IT
Transformation
Modernized
Systems
IT Efficiencies
Companies can SAVE
60% ~ 80% of their
Mainframe Costs with
Modernization
Typically 60% ~ 65%
of MIPS are used in
Mainframes by
BATCH processing
Estimated 45% of
FUNCTIONALITY
in mainframes is
never used
19. 19
Mainframe Migration – Traditional Approach
• Traditional approaches to mainframe elimination call for
large initial investments and carry significant risks – It is
hard to match Mainframe performance and reliability.
• Many organizations still utilize mainframe for batch
processing applications. Several solutions presented to
move expensive mainframe computing to other distributed
proprietary platform, most of them rely on end-to-end
migration of applications.
20. 20
Mainframe Batch Processing MetaScale Architecture
• Using Hadoop, Sears/MetaScale developed an innovative
alternative that enables batch processing migration to
Hadoop Ecosystem, without the risks, time and costs of
other methods.
• The solution has been adopted in multiple businesses with
excellent results and associated cost savings, as
Mainframes are physically eliminated or downsized:
Millions of dollars in savings based on MIP reductions have
been seen.
21. 21
MetaScale Mainframe Migration Methodology
Implement a
Hadoop-centric
reference
architecture
Move enterprise
batch
processing to
Hadoop
Make Hadoop
the single point
of truth
Massively
reduce ETL by
transforming
within Hadoop
Move results
and aggregates
back to legacy
systems for
consumption
Retain, within
Hadoop, source
files at the finest
granularity for
re-use
1 2 3 4 5 6
Key to our Approach:
1) allowing users to continue to use familiar consumption interfaces
2) providing inherent HA
3) enabling businesses to unlock previously unusable data
22. 22
Mainframe Migration - Benefits
“MetaScale
is the market
leader in moving
mainframe batch
processing to
Hadoop”
• Readily available resources
& commodity skills
• Access to latest technologies
• IT Operational Efficiencies
• Moved 7000 lines of COBOL
code to under 50 lines in PiG
• Ancient systems no longer
bottleneck for business
• Faster time to Market
• Mission critical “Item
Master” application in
COBOL/JCL being converted
by our tool in Java (JOBOL)
• Modernized
COBOL, JCL, DB2, VSAM, IMS
& so on
• Reduced batch processing in
COBOL/JCL from over 6 hrs
to less than 10 min in PiG
Latin on Hadoop
• Simpler, and easily
maintainable code
• Massively Parallel Processing
• Significant reduction in ISV
costs & mainframe software
licenses fees
• Open Source platform
• Saved ~ $2MM annually
within 13 weeks by MIPS
Optimization efforts
• Reduced 1000+ MIPS by
moving batch processing to
Hadoop
Cost
Savings
Transform
I.T.
Skills &
Resources
Business
Agility
23. 23
Summary
• Hadoop can revolutionize Enterprise workload and make business
agile
• Can reduce strain on legacy platforms
• Can reduce cost
• Can bring new business opportunities
• Must be an eco-system
• Must be part of an data overall strategy
• Not to be underestimated
24. 24
The Learning
HADOOP
We can dramatically reduce batch processing times for mainframe and EDW
We can retain and analyze data at a much more granular level, with longer history
Hadoop must be part of an overall solution and eco-system
IMPLEMENTATION
We can reliably meet our production deliverable time-windows by using Hadoop
We can largely eliminate the use of traditional ETL tools
New Tools allow improved user experience on very large data sets
UNIQUE
VALUE
We developed tools and skills – The learning curve is not to be underestimated
We developed experience in moving workload from expensive, proprietary mainframe and EDW
platforms to Hadoop with spectacular results
Over two years of Hadoop experience using Hadoop for Enterprise legacy workload.
25. 25
• Automation tools and techniques that ease the Enterprise integration of
Hadoop
• Educate traditional Enterprise IT organizations about the possibilities and
reasons to deploy Hadoop
• Continue development of a reusable framework for legacy workload
migration
The Horizon – What do we need next?
26. 26
Legacy Modernization Service Offerings
• Leveraging our patent pending and award-winning niche` products, we reduce
Mainframe MIPS, Modernize ETL processing and transform business and IT
organizations to open source, cloud based, Big Data and agile platform
• MetaScale Legacy Modernization offers following services –
Legacy Modernization Assessment
Services
Mainframe Migration Services
• MIPS Reduction Services
• Mainframe Application Migration
Legacy Distributed Modernization
• ETL Modernization Services
• Modernize Proprietary Systems and
Databases
Managed Applications Support
Support Transition Services