To disrupt and innovate, you need access to data. All of your data. The challenge for many organisations is that the data they need is locked away in a variety of silos. And there's perhaps no bigger silo than one of the most a widely deployed business application: SAP. Bringing together all your data for analytics and machine learning unlocks new insights and business value. Together, Cloudera and Datavard hold the key to breaking SAP data out of its silo, providing access to unlimited and untapped opportunities that currently lay hidden.
3. # 3
Why you need to bridge SAP and Hadoop to turn your
data into Business Value
4. # 4
SAP and Hadoop – bridging two worlds
Hadoop
Java, Python, PigLatin
Massive clusters for big data processing
Structured & unstructured data
Apache & open source
Distributions (e.g. Cloudera)
Engines (e.g. Spark, Impala)
Fast paced evolution since 2006
Big Data management
SAP
ABAP
Client/Server
classic RDBMS as relational database
Proprietary software
Interfaces and open standards
Business Software
Steady evolution since 1972
Data management
5. # 5
SAP and Hadoop – bridging two worlds
Hadoop
Java, Python, PigLatin
Massive clusters for big data processing
Structured & unstructured data
Apache & open source
Distributions (e.g. Cloudera)
Engines (e.g. Spark, Impala)
Fast paced evolution since 2006
Big Data management
SAP
ABAP
Client/Server
classic RDBMS as relational database
Proprietary software
Interfaces and open standards
Business Software
Steady evolution since 1972
Data management
75% of global GDP is generated by
companies running on SAP®
6. # 6
Data Management Issues
Scalability
Data-Pipelines
Granularity and Velocity
Data-Silos
Extensibility
• Not any more possible to do lifetime sizing of platform during procurement
• HW requirements create limitations to possible growth
• Scale UP comes often with great cost, and scale DOWN is usually
valueless
• Data transformations are I/O intensive operations
• Take lot of time, consume lot of resources
• Limitations on format of data
• Limitations on granularity of data, often only aggregated and cleaned
data are stored
• Raw data are necessary for data science activities
• Too many places for storing data
• No interconnection between company units limits data analyzing
possibilities
• Data analyses requires lot of programing languages
• Limited applications compatibility
7. # 7
From Data management to Big Data management
Data Management Issues
Data Growth
Data Separation
8. # 8
From Data management to Big Data management
Data Management Issues Business Questions to
answer
Data Growth
Data Separation
Cost Reduction
Revenue Increase
10. # 10
“Only 12-18% of all data in BW is
actually used.”
Forrester research
11. # 11
“Only 12-18% of all data in BW is
actually used.”
Forrester research
“In Average 35% of SAP data is
temporary and could be deleted”
Based on 300+ Fitness Tests
12. # 12
3%
5%
5%
5%
9%
11%
15%
15%
32%
Cube D data
Master data
Cube F data
Cube E data
PSA data
Changelog data
Other data
Temporary data
DSO data
0% 5% 10% 15% 20% 25% 30% 35%
Data distribution in SAP BW* * Based on 300+ DataVard BW FitnessTestTM
“Only 12-18% of
all data in BW
is actually
used.”
Forrester research
35 %
Housekeeping
“In Average
35% of SAP data
is temporary
and could be
deleted”
Based on 300+ Fitness Tests
13. # 13
DATA GROWTH WITH & WITHOUT DATATIERING
1290
1710
2250
2925
3803
4943
774 716 754
857
1041
1309
0
1000
2000
3000
4000
5000
6000
2017 2018 2019 2020 2021 2022
Data size without datatiering Data size after datatiering
SAP DATA GROWTH (in GB)
3.6 TB
saving
DATA GROWTH
25% p.a.
SIZE TODAY
1,3 TB
SIZE IN 5 YEARS
4,9 TB
DATATIERING ROI
2 YEARS
24. # 24
From Data management to Big Data management
Data Management Issues Business Questions to
answer
Data Growth
Data Separation
Cost Reduction
Revenue Increase
25. # 25
From Data management to Big Data management
Data Management Issues Big Data Management
Solutions
Business Questions to
answer
Data Growth
Data Separation
Cost Reduction
Revenue Increase
Data Tiering
Data Integration
26. # 26
2. Data Integration use case stream - GLUE
1. Data Tiering use case stream - OUTBOARD
From Data management to Big Data management
Data Growth
Data Separation
Cost Reduction
Revenue Increase
Data Tiering
Data Integration
27. # 27
From Data management to Big Data management
1. Data Tiering use case stream - OUTBOARD
Data Growth Cost Reduction Data Tiering
2. Data Integration use case stream - GLUE
Data Separation Revenue Increase Data Integration
3. Security Analyses use case stream – Data Science
Data Protection Cost Prevention Security Analyses
28. # 28
From Data management to Big Data management
1. Data Tiering use case stream - OUTBOARD
Data Growth Cost Reduction Data Tiering
2. Data Integration use case stream - GLUE
Data Separation Revenue Increase Data Integration
3. Security Analyses use case stream – Data Science
Data Protection Cost Prevention Security Analyses
3. Data Aging or decommission of old system – Data Fridge scenario
Data Aging GDPR/Costs Data Fridge
44. # 44
Who is Datavard
Focus on SAP and Data Management: Business Transformation, SAP ABAP, and Big Data
Software products and consulting services
More than 200 projects p.a.
Customers of all industries, regions and sizes
No “me too” topics
Strong partnership with SAP since 1998
Privately held since 1998, 2018: 245 employees
Germany: Heidelberg (HQ), Hamburg | USA: Philadelphia, Washington DC
Switzerland: Regensdorf | Italy: Milan | Central Europe: Bratislava | Singapore
Explore Optimize Transform Innovate
Hello Everyone I’m…Why you need…
Thx Cloudera
Now...How many of you know about SAP?
Left corner SAP
Right corner – I don’t need probably to talk about that
Now for the purposes of my presentation I would consider
SAP as the essential… expensive
And Hadoop a cheap yet powerful…all kind of data
To justify why I’m here today
There are lot of cool companies…
BUT 75%
Now why would we want to connect…
Because we are in trouble!!!
Lots of trouble
Please I ask you not to read…
3 years old slide, 1TB – 5TB
Jim Rohn said – “You don’t need 5 reasons to fail, one is enough!”
So - let me give you two biggest issues!
Data Growth
1. 2016
2. Expensive systems
Data Separation
More vicious one, mostly for 21st century
Units, Systems, security
1. What we are missing are the business questions? Why?
2. Well, if you have issues not related to your business
3. So what are the simplest but yet most related business questions?...
Two valid business questions – Cost reduction and Revenue Increase
Does that make sense to you? Do you want to solve those two first?
Good so! How to do cost reduction? Or How much trash is in your system
Forrester research: 12% of data is used in reporting
Fitness test – answers how is your system being used?
Datavard find out that in Average 35% of SAP data is temporary and can be deleted
Let me talk money! IF you have spend 10MIL on your 20TB SAP Hana system
You are using 1,2 – 1,8M out of 10
AND! 3,5M you spent on trash – what an investment
This is actually average of the biggest BW systems in the world
Data allocation
Now calculation based on real case scenario from last year
Left side
Comparison of system growth with our Data Tiering solution and Without
Without Data Tiering solution exponential, with hadoop and data tiering solution we are more in the linear world
Saving of ~3M on the SAP Hana in the horizon of several years
Guys from Cloudera – can I spin a hadoop cluster in the Altus for 3Mil?
So ROI is 2YEARS SO does this make sense to you to bridge sap with hadoop in order to do this??
1. Revenue increase - Or to increase value of your data – so you increase value of your business.
2. Fact of life is I cannot quantify in general…
3. And there are a lot of use cases around – but there is a rule of the thumb and it’s not coming from the computer world!
4. It is actually leadership 1.0
Do you know what is the relation between output of group of people and the total of output generated by individuals?
Anyone a People manager?
Of course the output value of the group is higher.
Only the diversity by itself is a value!
Same it’s with your data. Combined data have much bigger value then individually presented.
Let me give you an example
Do you know what is this mountain?
It’s not any particular stock-price
It’s average daily temperature in the month of March in Bratislava – central Europe – Stable continental weather – at least used to be
We have one customer, premium retail shop. I love retails.
Their SAP system is filled with details of inventory… But there used to be nothing about the weather data.
Now let’s imagine that you…
Come at the beginning of March for winter cloth – yeas business done check
Spring cloth is history – winter jacket to tshirt
You come 2nd week when there is 20 degrees more and they have winter jacket in sale?
You want to buy Tshirt – either no buy or next shop in the market OR they tell you come in a week.
When there is -6?
So how can you create strategy when you have no clue on what is going to happen?
Without diversity of the data you will be able to count only your loss in comparison to normal months
You start with a proper platform!
You add core data and another source of data
You want to know how your customers feel about the change
And I recommend smart BI solution on top
Small sanity check
Does that make sense?
1. So how it fits together?
We have our Issue and Business question connection?
So lets add a solution
Data Tiering from Datavard and Cloudera
And Data Integration
What brings us to direct streams and complete answer to why you should bridge SAP with a Hadoop!
Now, I’ve said you don’t need 5 reasons to fail, one is enough
Actually you don’t need to be successful in both areas to justify your big-data platform, one is enough! But do try both!
Or do you need more?
You want another? Data Protection
How that relates to Hadoop?
You have system with 20.000 users writing or reporting on data -> you don’t doo security analyses…
You want more?
Who would I be if I would not mention GDPR!
You want another? Data Aging!
Now does that answer the question why you need to bridge SAP and Hadoop? What do you say?
Exactly it makes complete sense to do it so HOW?
I’ll answer that in case you are interested in a f2f conversation
If you allow me to spend time with you and get answer to few core questions I believe you can greatly benefit
Ovum’s definition of a data lake is a governed repository that becomes the default ingest point for raw data. So here we are: an idyllic data lake side setting.
Data Lakes got a ‘bad rap’ early on because they were just repositories. The tools and technologies for governance and data stewardship were missing or immature.
ADD DATA PRIVACY HIGHLIGHTS
So we introduced Cloudera SDX - or shared data experience – the foundations of Cloudera Enterprise.
SDX makes it possible for companies to run dozens - hundreds - of analytic applications against a common pool of data.
SDX applies a centralized, consistent framework for catalog, security, governance, management, data ingest and more.
It makes it faster, easier, and safer for organizations, teams, people to develop and deploy high-value, multi-function use cases like customer next best offer, clinical prediction, and risk modeling.
SDX cuts through silos to unify data, analytics, management, security, and governance, and empowers self-service
BUSINESS CATALOG SERVICES (NOT JUST HMS) ALL DATA SETS, SCHEMAS, COLLABORATIVE TAGS, BUSINESS CLASSIFICATIONS, TARGETTED FOR EACH USER
SDX is a set of open platform services built for multi-functional or multi-disciplinary analytics that have been optimized for the cloud. This means that we offer a unified security model that helps protect sensitive data with a consistent set of controls, that we offer a consistent governance model that enables self-service secure access to all of your relevant data. Not just one type of data, really to all of it, increasing your ability to be compliant, particularly in a regulatory environment. Next, easy workload management that increases user productivity and boosts job predictability. Next, flexible data ingest and replication. We have a number of core partners that we work with in this arena that help you aggregate a single copy of all of your data, providing you easier debt disaster recovery and that eases migration of data from one place to another. Last but not least, as I mentioned a moment ago, we offer a shared catalog that helps to define and preserve the structure and the business context of all your data, regardless of where it happens to reside. So, SDX is really a core piece of how we at Cloudera separate ourselves from the competition.
Note: The content of this slide is based on the Success Story and video in 2015. The slide was created in NOV, 2016.
Company Background: With £18 billion (about US$30 billion) in revenue in 2014, BT is one of the largest telecommunications providers in the world. The company serves more than 18 million consumers and nearly three million businesses.
Use Case: For BT, the key to achieving sustainable, profitable growth in today's competitive landscape is its ability to broaden and deepen customer relationships. To support this goal, BT is using a Cloudera enterprise data hub (EDH) to accelerate data velocity and fast-track the delivery of new offerings to its customers. This EDH provides the backbone for an operational data store (ODS) that enables BT to break through data silos to ingest, store, and prepare data for myriad operational and analytical uses. Within one year, BT increased data processing velocity by a factor of 15, achieved an ROI between 200 and 250 percent, and is now positioned to take on new projects faster at a lower cost.
Moving its ETL platform to Cloudera enabled BT to accelerate data velocity, processing five times the data in a third of the time.
Following the success of its ETL initiative, BT is now utilizing the Cloudera to help deliver its broadband services. The speed of an individual line is dominated by its length (the distance from network equipment to a customer’s premises), but many other factors can have a significant impact on customer experience. BT uses Cloudera to join network topology (GIS) data with terabytes of DSL performance (time series) and electrical line test data to grade the quality of every line in the network. Using this network analysis, the probability of a successful
outcome of an engineer dispatch can be predicted. This reduces wasted engineer visits and truck rolls
BT’s work with Cloudera is also helping position the company to take advantage of the Internet of Things (IoT). Take its work with BT is part of the MK:Smart initiative for Milton Keynes (MK), a fast-growing town in Buckinghamshire, England. This initiative includes early IoT solutions such as sensors in car parking spaces that broadcast if the spots are vacant or occupied. Citizens and visitors can then use a smartphone app that guides them to the nearest free parking space based on the sensor data. According to BT, the same data ultimately will be used to better inform multi-million pound infrastructure decisions, such as the location and size of future car parks.
IoT and fleet vehicle analytics are also a growing area for BT. The company offers fleet services as a managed service to other utility companies. One of the competitive features that BT can offer is the ability to instrument those vehicles and collect data from them. Ultimately, the company seeks to predict analytics around faults, so it can identify a vehicle failing early, improve the lifetime of that vehicle, and help reduce its overall carbon footprint.
SOLUTION HIGHLIGHTS
Modern Data Platform: Cloudera Enterprise, Data Hub Edition
Key Components: Apache Hive, Apache Impala, Apache Pig, Apache Sentry, Apache Spark, Cloudera Manager, Cloudera Navigator
Industry Use Case: Telecommunications
IMPROVED SERVICE
PROCESS IMPROVEMENT
IT COST REDUCTION
Read more with the published story: http://www.cloudera.com/customers/bt.html
Note: The content of this slide is based on the Success Story in JUNE, 2017. The slide was created in JUNE, 2017.
Company Background:
Podo is a Spanish utilities company, providing electricity to consumers and businesses across Spain.
Use Case:
Podo is revolutionizing the utilities industry, using a cloud-based machine learning and advanced analytics platform from Cloudera and Google to help accurately predict future consumption patterns and provide consumers with fully customized rates.
Data sources:
Historical customer records
IoT data from lights and connected devices
Third party databases for government statistics and property records
Solution
Modern Data Platform: Cloudera Enterprise
Workloads: Analytic Database, Data Engineering and Data Science
Components: Apache Impala (incubating), Apache Spark, Cloudera Manager
Analytic tools: R, Python, Matlab
Cloud: Google Cloud Platform
Industry Use Case:
Customer 360°
Network optimization
Operational analytics
Data monetization
Read more with the published story: https://www.cloudera.com/more/customers/Podo.html?cq_ck=1497466958591
Note: The content of this slide is based on the PCI Solution brief (http://www.cloudera.com/content/dam/cloudera/Resources/PDF/solution-briefs/MasterCard_PCI-Data-Security_SolutionBrief.pdf) in 2015. The slide was created in DEC, 2016.
Company Background: MasterCard’s principal business is to process payments between the banks of merchants and the card issuing banks or credit unions of the purchasers who use the "MasterCard" brand debit and credit cards to make purchases. MasterCard Worldwide has been a publicly traded company since 2006 and had $9.5B in 2014 annual revenue and has 6,700 employees. Prior to its initial public offering, MasterCard Worldwide was a cooperative owned by the 25,000+ financial institutions that issue its branded cards.
Use Case: MasterCard chose Cloudera Enterprise for fraud detection and to optimize their DW infrastructure and later expanded to form a partnership with MC Advisors, the consulting arm of MasterCard. MasterCard requires that any technology handling its applications or payment card data files must have full PCI certification. Receiving this important certification allows MasterCard the opportunity to integrate Hadoop datasets with other environments that are already PCI-certified.
Solution
Modern Data Platform: Cloudera Enterprise
Industry Use Case: Financial Services
Fraud Prevention
Read more with the published solution brief: http://www.cloudera.com/content/dam/cloudera/Resources/PDF/solution-briefs/MasterCard_PCI-Data-Security_SolutionBrief.pdf
We are an open platform /open ecosystem – runs anywhere (shrunk version of platform) at center
Show ISVs on top
SIs on right
Show /platform cloud on bottom