Bridging SAP and Hadoop for Business Value

WHEN SAP ALONE IS NOT ENOUGH
Wim Stoop | Senior Technical Marketing Manager, Cloudera
Michal Alexa | Service Line Manager, Datavard

2 © Cloudera, Inc. All rights reserved.
TODAY’S SPEAKERS
Wim Stoop
Senior TMM
wim@cloudera.com
Michal Alexa
Service Line Manager
michal.alexa@datavard.com

# 3
Why you need to bridge SAP and Hadoop to turn your
data into Business Value

# 4
SAP and Hadoop – bridging two worlds
Hadoop
 Java, Python, PigLatin
 Massive clusters for big data processing
 Structured & unstructured data
 Apache & open source
 Distributions (e.g. Cloudera)
 Engines (e.g. Spark, Impala)
 Fast paced evolution since 2006
 Big Data management
SAP
 ABAP
 Client/Server
 classic RDBMS as relational database
 Proprietary software
 Interfaces and open standards
 Business Software
 Steady evolution since 1972
 Data management

# 5
SAP and Hadoop – bridging two worlds
Hadoop
 Java, Python, PigLatin
 Massive clusters for big data processing
 Structured & unstructured data
 Apache & open source
 Distributions (e.g. Cloudera)
 Engines (e.g. Spark, Impala)
 Fast paced evolution since 2006
 Big Data management
SAP
 ABAP
 Client/Server
 classic RDBMS as relational database
 Proprietary software
 Interfaces and open standards
 Business Software
 Steady evolution since 1972
 Data management
75% of global GDP is generated by
companies running on SAP®

# 6
Data Management Issues
Scalability
Data-Pipelines
Granularity and Velocity
Data-Silos
Extensibility
• Not any more possible to do lifetime sizing of platform during procurement
• HW requirements create limitations to possible growth
• Scale UP comes often with great cost, and scale DOWN is usually
valueless
• Data transformations are I/O intensive operations
• Take lot of time, consume lot of resources
• Limitations on format of data
• Limitations on granularity of data, often only aggregated and cleaned
data are stored
• Raw data are necessary for data science activities
• Too many places for storing data
• No interconnection between company units limits data analyzing
possibilities
• Data analyses requires lot of programing languages
• Limited applications compatibility

# 7
From Data management to Big Data management
Data Management Issues
Data Growth
Data Separation

# 8
Data Management Issues Business Questions to
answer
Data Growth
Data Separation
Cost Reduction
Revenue Increase

# 10
“Only 12-18% of all data in BW is
actually used.”
Forrester research

# 11
“Only 12-18% of all data in BW is
actually used.”
Forrester research
“In Average 35% of SAP data is
temporary and could be deleted”
Based on 300+ Fitness Tests

# 12
3%
5%
5%
5%
9%
11%
15%
15%
32%
Cube D data
Master data
Cube F data
Cube E data
PSA data
Changelog data
Other data
Temporary data
DSO data
0% 5% 10% 15% 20% 25% 30% 35%
Data distribution in SAP BW* * Based on 300+ DataVard BW FitnessTestTM
“Only 12-18% of
all data in BW
is actually
used.”
Forrester research
35 %
Housekeeping
“In Average
35% of SAP data
is temporary
and could be
deleted”
Based on 300+ Fitness Tests

# 13
DATA GROWTH WITH & WITHOUT DATATIERING
1290
1710
2250
2925
3803
4943
774 716 754
857
1041
1309
0
1000
2000
3000
4000
5000
6000
2017 2018 2019 2020 2021 2022
Data size without datatiering Data size after datatiering
SAP DATA GROWTH (in GB)
3.6 TB
saving
DATA GROWTH
25% p.a.
SIZE TODAY
1,3 TB
SIZE IN 5 YEARS
4,9 TB
DATATIERING ROI
2 YEARS

# 18
-10
-5
0
5
10
15
3/1/2018
3/8/2018
3/15/2018
3/22/2018
3/29/2018
Temperature in Bratislava March 2018

# 24
Data Management Issues Business Questions to
answer
Data Growth
Data Separation
Cost Reduction
Revenue Increase

# 25
Data Management Issues Big Data Management
Solutions
Business Questions to
answer
Data Growth
Data Separation
Cost Reduction
Revenue Increase
Data Tiering
Data Integration

# 26
2. Data Integration use case stream - GLUE
1. Data Tiering use case stream - OUTBOARD
Data Growth
Data Separation
Cost Reduction
Revenue Increase
Data Tiering
Data Integration

# 27
Data Growth Cost Reduction Data Tiering
Data Separation Revenue Increase Data Integration
3. Security Analyses use case stream – Data Science
Data Protection Cost Prevention Security Analyses

# 28
Data Growth Cost Reduction Data Tiering
Data Separation Revenue Increase Data Integration
3. Security Analyses use case stream – Data Science
Data Protection Cost Prevention Security Analyses
3. Data Aging or decommission of old system – Data Fridge scenario
Data Aging GDPR/Costs Data Fridge

IDEAL DATA LAKE SETTING

WHICH DO YOU WANT?
•
Data lake Data hub

USE DATA TO MAKE THE IMPOSSIBLE POSSIBLE
CONNECT PRODUCTS &
SERVICES (IoT)
GROW BUSINESS PROTECT BUSINESS

MODERN DATA
ARCHITECTURE ML / AI
(DATA SCIENCE)
ANALYTICS
CLOUD STORAGE ON-PREMISES STORAGE
MANAGEMENT & SECURITY
DATA
ENGINEERING

CLOUDERA
ENTERPRISE DATA
PLATFORM
The modern platform for
machine learning & analytics
optimized for the cloud
WORKLOADS 3RD PARTY
SERVICES
DATA
ENGINEERIN
G
DATA
SCIENCE
ANALYTIC
DATABASE
OPERATIONA
L DATABASE
DATA CATALOG
GOVERNANCESECURITY LIFECYCLE
MANAGEMENT
STORAGE
Microsoft
ADLS
COMMON SERVICES
HDFS
Amazon
S3
CONTROL
PLANE
KUDU

• Data Catalog: a comprehensive catalog of all data sets, spanning on-premises,
cloud object stores, structured, unstructured, and semi-structured. Includes
technical schemas from the Hive metastore, as well as business glossary
definitions, classifications, and usage guidance
• Security: role-based access control applied consistently across the platform
using Apache Sentry. Also includes full stack encryption and key management
• Governance: enterprise-grade auditing, lineage, and other governance
capabilities applied universally across the platform with rich extensibility for
partner integrations
• Lifecycle Management: comprehensive ingest-to-purge management of data
set lifecycle activities
• Control Plane: multi-environment cluster provisioning, deployment,
management, and troubleshooting
SHARED DATA CONTEXT SERVICES
Built for multi-function analytics anywhere
WORKLOADS 3RD PARTY
SERVICES
DATA
ENGINEERING
DATA
SCIENCE
ANALYTIC
DATABASE
OPERATIONAL
DATABASE
DATA CATALOG
GOVERNANCESECURITY LIFECYCLE
MANAGEMENT
STORAGE
Microsoft
ADLS
COMMON SERVICES
HDFS
Amazon
S3
CONTROL
PLANE
KUDU

HYBRID IS THE NEW NORMAL IN ML & ANALYTICS
CLOUD
• Elastic
• Transient
• IoT
• Dev / Test
• New locations
ON-PREMESIS
• Data sovereignty
• Persistent
• Legacy
• Cost
• Performance
+
Choice | Economics | Migration | Governance | Control

EXTENSIVE INTEGRATION WITH PUBLIC CLOUD VENDORS
DATA
ENGINEERING
DATA
SCIENCE
ANALYTIC
DATABASE
OPERATIONAL
DATABASE
CLOUDERA ENTERPRISE
Private Cloud
Infrastructure-as-a-Service
CLOUDERA ALTUS
DATA ENGINEERING DATA SCIENCEANALYTIC DB
Platform-as-a-Service
beta
beta soon
Bare Metal

ENTERPRISE-PROVEN MACHINE LEARNING AND ANALYTICS
MACHINE LEARNING
Pattern recognition
Anomaly detection
Prediction
Customers
Run on Cloudera
ANALYTICS
Self-service intelligence
Real-time analytics
Secure reporting
Customers
Run IMPALA on Cloudera

DATA-DRIVEN
JOURNEY
USE CASES
VISIBILITY
Preventive
& Proactive
Maintenance
IoT Hub for
Industry 4.0
Advanced
Threat
Detection
Risk
Modelling &
Analysis
Marketing
Systems
Integration
Customer
360
Insights
Exploratory
Data
Science
Data
Warehouse
Applied
Machine
Learning
GROW
Sales & Marketing
CONNECT
Operations & Product
PROTECT
Security & Compliance
MODERNIZE
IT, Tech, Data Science & Analytics

DELIVERING BETTER
BROADBAND SERVICE
• Deeper network analysis to better predict
customer internet speeds and identify the
cause of performance issues
• Reduces truck rolls to save millions of
pounds
• Positions BT to take advantage of IoT for
predictive maintenance on fleet service
vehicles
• Increased data velocity by 15X (5X the
data in 1/3 of the time)
DRIVE CUSTOMER
INSIGHTS
VISIBILITY
PRODUCTIVITY
TRANSFORMATION

CAPTURING AND GROWING
MARKET SHARE WITH 10X
MORE ACCURATE FORECASTS
• Saves consumers and businesses up
to 30% on electric bills
• Improves accuracy of predictions,
with error rate below 1%
• Enables creation of micro-targeted
campaigns in hours
CONNECT
PRODUCTS &
SERVICES
VISIBILITY
PRODUCTIVITY
TRANSFORMATION

DELIVERING DEEP INSIGHTS
AND BEST PRACTICES IN BIG
DATA SECURITY & COMPLIANCE
• First PCI Certified Hadoop platform
• Optimizes EDW and improves fraud
detection and prevention
• Secures 10 PB in a PCI-compliant
manner every day
• Security Information Event
Management (SIEM) — monitor
access to sensitive datasets, full audit
trail of user behavior
PROTECT YOUR
BUSINESS
VISIBILITY
PRODUCTIVITY
TRANSFORMATION

PARTNER
ECOSYSTEM
Focus on strategic
partnerships to expand
reach and accelerate
consumption
ISVs & SOLUTIONS
CLOUD & PLATFORM
SYSTEM
INTEGRATORSRESELLERS

# 44
Who is Datavard
 Focus on SAP and Data Management: Business Transformation, SAP ABAP, and Big Data
 Software products and consulting services
 More than 200 projects p.a.
 Customers of all industries, regions and sizes
 No “me too” topics
 Strong partnership with SAP since 1998
 Privately held since 1998, 2018: 245 employees
 Germany: Heidelberg (HQ), Hamburg | USA: Philadelphia, Washington DC
Switzerland: Regensdorf | Italy: Milan | Central Europe: Bratislava | Singapore
Explore Optimize Transform Innovate

Bridging SAP and Hadoop for Business Value

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Bridging SAP and Hadoop for Business Value

Similaire à Bridging SAP and Hadoop for Business Value (20)

Plus de Cloudera, Inc.

Plus de Cloudera, Inc. (12)

Dernier

Dernier (20)

Bridging SAP and Hadoop for Business Value

Notes de l'éditeur