SlideShare a Scribd company logo
1 of 14
Spark Usage in
Enterprise Business
Operations
Ken Tsai
VP, Data Management & Platform-as-Services
SAP
@kentsaiSAP
2.17.16: Spark Summit, NYC
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
© 2016 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an
SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE
(or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark
information and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or
SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and
services, if any. Nothing herein should be construed as constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or
release any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for
any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
SAP – Our Quick Snapshot in the Enterprise Computing World
74% of the world’s
transaction revenue
touches an SAP system.
SAP’s product focus:
Enterprise Applications
Business Networks
Platforms – 15 yrs on IMC
SAP customers represent
87% of Forbes Global
2,000 companies.
SAP touches
$16 trillion of world
consumer purchases.
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
SAP HANA – An In-Memory Platform to Enable New Business Scenarios
Previously Not Feasible
COSPCOEPCOBKBKPF BSEG BSEG BSEG BSIS BSIS BSIK BSET LFC1 GLT0 GLT0 GLT0
SAP Simple Finance 4 0
updatesinserts
SAP Finance with aggregates and indices 10 5
no indices no aggregates no redundancies
CORE DATA STRUCTURE
REMAINS UNCHANGED
• Soft financial close anytime
• Real-time revenue and cost analysis
• Real-time liquidity forecasts
• Real-time alerts and blocks on suspicious
transactions
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
Distributed Big Data Is Everywhere
How to better use it in core enterprise business applications?
~79% of Data
Reservoirs/Lakes are still
disconnected from core
business operations
How do I embed big data signal
into my business applications
and enterprise analytics?
53 Difficulty integrating
with CRM and/or
other systems
%
49
Unable to apply or integrate
external data quickly
enough to inform real-time
decision making
%
59 Only a few analysts with
specialized training can
analyze big data
%
Harvard Business Review Analytic Services, Global Survey of 251 Respondents, Sept. 2015
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
Introducing SAP HANA Vora
An in-memory query engine that extends the Apache Spark execution framework
to enrich the interactive analytics experiences on massively distributed computing clusters
• OLAP processing
• In-Memory
Computing for high
performance
• Connecting to
Enterprise
Systems
• Unified System
Management
SAP HANA
ERP DATA BIG DATA
Parallelized
Queries
Vora
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
Key Open Source Contribution to Apache Spark Ecosystem
Spark to HANA Push-downs & Data Hierarchies
scala> val hierarchy = sqlContext.sql( s"""
SELECT
LVL, COUNT(*), ROUND( AVG(P_RETAILPRICE), 2)
FROM (
SELECT LEVEL(node) AS LVL, P_RETAILPRICE
FROM
HIERARCHY(
USING PART_HIERARCHY AS c
JOIN PARENT p ON c.P_PARENT = p.P_PARTKEY
SEARCH BY
P_PARTKEY ASC
START WHERE
P_PARTKEY = 1
SET node ) AS H0
) T1 GROUP BY LVL
""".stripMargin ).collect().foreach(println)
90
1
90
3
91
3
91
2
90
4
91
1
+---+---+------------+
|LEVEL|COUNT|AVG(P_RETAILPRICE)|
+-----+-----+------------------+
| 0 | 1 | 901 |
| 1 | 2 | 903.5 |
| 2 | 3 | 912 |
+-----+-----+------------------+
val options = Map("dbschema" -> config.user,"host" ->
config.host,"instance" -> config.instance)
# HANA Live CustomerBasicData Virtual Data Model
val custConf = options + ("path" ->
s"""sap.hba.ecc/CustomerBasicData""")
val cust =
sqlContext.read.format("com.sap.spark.hana").options(custConf).load()
cust.registerTempTable("customer")
# HANA Live SalesOrderHeader VDM
val sohConf = options + ("path" ->
s"""sap.hba.ecc/SalesOrderHeader""")
val soh =
sqlContext.read.format("com.sap.spark.hana").options(sohConf).load()
soh.registerTempTable(soh)
# Top 5 Countries by Sales Order Volume
salesOrder = sqlContext.sql("select "Country",count(*) as Frequency
from salesOrder as s LEFT OUTER JOIN customer as c on
s.soldToParty = c.Customer
GROUP BY Country ORDER BY Frequency desc”)
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
Airline Use Case – Optimize MRO scheduling with Sensor Data
Challenges
• $10,000 loss for every hour spent
on maintenance, repair, and
overhaul (MRO)
• Predictive MRO generates TB of
sensor data per flight
Solution
• SAP HANA Vora rapidly processes
sensor data in HDFS and
combines it with flight schedule
and staffing data in SAP HANA to
prioritize maintenance jobs and
accelerate MRO
Why SAP
HANA Vora
• Optimize MRO operations with
interactive, on-demand drill down
by airport, flight route, etc.
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
Utility Use Case – CenterPoint Energy
Challenge
• Smart meters generate TBs of
data/month
• Regulatory requirement to retain
data for 10 years
• Current storage solution full by
end-2016
• Need to leverage HDFS as an
additional tier for storage
Solution
• SAP HANA for most recent sensor
signal and operational data,
Dynamic Tiering for 1~2yrs old
data, HDFS for historical sensor
data
• SAP HANA Vora accesses and
queries data across all tiers
Why SAP
HANA Vora
• SAP HANA Vora provides
enterprise analytics & OLAP like
experience across data
warehouse and HDFS.
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
Utility Use Case – How It Works
CenterPoint Energy
Our benchmark tests proved that
SAP HANA paired with SAP
HANA Vora are the right
solutions for us. We expect
immediate cost benefits and to
see competitive differentiation
in the future.”
Gary Hayes,
CIO & SVP at CenterPoint Energy
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
SAP HANA
MOST RECENT
SENSOR DATA
Dynamie
Tiering
1-2 YR OLD DATA
Parallelized
Queries
HDFS
HISTORICAL SENSOR DATA
Query data within and across tiers
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
Financial Services Use Case – Extend Fraud Pattern Detection
Challenges
• 100+ million business transactions
daily, 25% growth YoY
• Limited access to archived data
• Difficult to detect patterns in
historical transactions
Solution
• Current transactions in SAP
HANA, historical transactions in
HDFS clusters
• Real-time detection of
abnormalities
Why SAP
HANA Vora
• Real-time, aggregated insights
from current and historical
transactions
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
2016 and the Road Ahead
Customers in North
America, APJ, and
EMEA
Dev edition
available on AWS
TODAY
General Availability
Vora Modeler to
build and query
OLAP style cubes on
data
COMING
SOON
Planning (HR, Financial)
Extend engine support
for time series
Transaction
management
Analytics on archived
ERP data in Hadoop
FUTURE
© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
Contribute to Spark Ecosystem, Embrace Best of Community Innovation
Contribution to
Open Source:
Hierarchy capabilities
Connection to ERP: predicate
pushdown to HANA
Thank you!
Ken Tsai: ken.tsai@sap.com
@kentsaiSAP
Enter to Win a
GoPro HERO4
Session at
SAP Booth 102
Learn More @
hana.sap.com/vora
Try Dev Edition
bit.ly/1K1qLyo
We’re Hiring: https://spark-summit.org/east-2016/jobs/

More Related Content

What's hot

Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Databricks
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Databricks
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 

What's hot (20)

Spark meets Smart Meters
Spark meets Smart MetersSpark meets Smart Meters
Spark meets Smart Meters
 
Oracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer IntroductionOracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer Introduction
 
Data in Motion vs Data at Rest
Data in Motion vs Data at RestData in Motion vs Data at Rest
Data in Motion vs Data at Rest
 
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
 
Loan Decisioning Transformation
Loan Decisioning TransformationLoan Decisioning Transformation
Loan Decisioning Transformation
 
DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...
DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...
DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...
 
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
Ford
FordFord
Ford
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druid
 
Personalization using Machine Learning
Personalization using Machine LearningPersonalization using Machine Learning
Personalization using Machine Learning
 
Splunk Architecture
Splunk ArchitectureSplunk Architecture
Splunk Architecture
 
Using Hadoop for Cognitive Analytics
Using Hadoop for Cognitive AnalyticsUsing Hadoop for Cognitive Analytics
Using Hadoop for Cognitive Analytics
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management Stack
 
SAP HANA Database
SAP HANA DatabaseSAP HANA Database
SAP HANA Database
 

Similar to Spark Summit presentation by Ken Tsai

Driving digital transformation in Automotive industry
Driving digital transformation in Automotive industryDriving digital transformation in Automotive industry
Driving digital transformation in Automotive industry
Debashis Majumder
 
S4F01_EN_Col17 Financial Accounting in SAP S4HANA for SAP ERP FI Professional...
S4F01_EN_Col17 Financial Accounting in SAP S4HANA for SAP ERP FI Professional...S4F01_EN_Col17 Financial Accounting in SAP S4HANA for SAP ERP FI Professional...
S4F01_EN_Col17 Financial Accounting in SAP S4HANA for SAP ERP FI Professional...
lakshmi vara
 

Similar to Spark Summit presentation by Ken Tsai (20)

Spark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken Tsai
 
S4 1610 business value l1
S4 1610 business value l1S4 1610 business value l1
S4 1610 business value l1
 
Webinar SAP BusinessObjects Cloud (English)
Webinar SAP BusinessObjects Cloud (English)Webinar SAP BusinessObjects Cloud (English)
Webinar SAP BusinessObjects Cloud (English)
 
02_SAP_S4HANA_Value_Roadmap_Next_Generation_Suite2.pdf
02_SAP_S4HANA_Value_Roadmap_Next_Generation_Suite2.pdf02_SAP_S4HANA_Value_Roadmap_Next_Generation_Suite2.pdf
02_SAP_S4HANA_Value_Roadmap_Next_Generation_Suite2.pdf
 
Driving digital transformation in Automotive industry
Driving digital transformation in Automotive industryDriving digital transformation in Automotive industry
Driving digital transformation in Automotive industry
 
SAP Cloud Strategy & References
SAP Cloud Strategy & ReferencesSAP Cloud Strategy & References
SAP Cloud Strategy & References
 
S/4HANA, the next generation Business Suite
S/4HANA, the next generation Business SuiteS/4HANA, the next generation Business Suite
S/4HANA, the next generation Business Suite
 
SAP Analytics Overview and Strategy
SAP Analytics Overview and StrategySAP Analytics Overview and Strategy
SAP Analytics Overview and Strategy
 
SAP Vora CodeJam
SAP Vora CodeJamSAP Vora CodeJam
SAP Vora CodeJam
 
BPI_Topic #3_Introduction to SAP S4HANA (1)-merged (1).pdf
BPI_Topic #3_Introduction to SAP S4HANA (1)-merged (1).pdfBPI_Topic #3_Introduction to SAP S4HANA (1)-merged (1).pdf
BPI_Topic #3_Introduction to SAP S4HANA (1)-merged (1).pdf
 
Sapwebinar2 how 2transition2s4hanagetyourdatacleanandkeepitclean1569951002523
Sapwebinar2 how 2transition2s4hanagetyourdatacleanandkeepitclean1569951002523Sapwebinar2 how 2transition2s4hanagetyourdatacleanandkeepitclean1569951002523
Sapwebinar2 how 2transition2s4hanagetyourdatacleanandkeepitclean1569951002523
 
Sourcing and Procurement Excellence in a Networked World
Sourcing and Procurement Excellence in a Networked WorldSourcing and Procurement Excellence in a Networked World
Sourcing and Procurement Excellence in a Networked World
 
Dev207 berlin
Dev207 berlinDev207 berlin
Dev207 berlin
 
#askSAP Analytics Innovations Community Call: Reimagine Analytics for the Dig...
#askSAP Analytics Innovations Community Call: Reimagine Analytics for the Dig...#askSAP Analytics Innovations Community Call: Reimagine Analytics for the Dig...
#askSAP Analytics Innovations Community Call: Reimagine Analytics for the Dig...
 
Dmm300 – mixed scenarios for sap hana data warehousing and BW: overview and e...
Dmm300 – mixed scenarios for sap hana data warehousing and BW: overview and e...Dmm300 – mixed scenarios for sap hana data warehousing and BW: overview and e...
Dmm300 – mixed scenarios for sap hana data warehousing and BW: overview and e...
 
Disaster Recovery for SAP HANA with SUSE Linux
Disaster Recovery for SAP HANA with SUSE LinuxDisaster Recovery for SAP HANA with SUSE Linux
Disaster Recovery for SAP HANA with SUSE Linux
 
S4F01_EN_Col17 Financial Accounting in SAP S4HANA for SAP ERP FI Professional...
S4F01_EN_Col17 Financial Accounting in SAP S4HANA for SAP ERP FI Professional...S4F01_EN_Col17 Financial Accounting in SAP S4HANA for SAP ERP FI Professional...
S4F01_EN_Col17 Financial Accounting in SAP S4HANA for SAP ERP FI Professional...
 
Overview of SAP HANA Cloud Platform
Overview of SAP HANA Cloud PlatformOverview of SAP HANA Cloud Platform
Overview of SAP HANA Cloud Platform
 
#asksap Analytics Innovations Community Call: SAP BW/4HANA - the Big Data War...
#asksap Analytics Innovations Community Call: SAP BW/4HANA - the Big Data War...#asksap Analytics Innovations Community Call: SAP BW/4HANA - the Big Data War...
#asksap Analytics Innovations Community Call: SAP BW/4HANA - the Big Data War...
 
SAP HANA Data Center Intelligence Overview
SAP HANA Data Center Intelligence OverviewSAP HANA Data Center Intelligence Overview
SAP HANA Data Center Intelligence Overview
 

More from Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Recently uploaded (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 

Spark Summit presentation by Ken Tsai

  • 1. Spark Usage in Enterprise Business Operations Ken Tsai VP, Data Management & Platform-as-Services SAP @kentsaiSAP 2.17.16: Spark Summit, NYC
  • 2. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 © 2016 SAP SE or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward- looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
  • 3. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 SAP – Our Quick Snapshot in the Enterprise Computing World 74% of the world’s transaction revenue touches an SAP system. SAP’s product focus: Enterprise Applications Business Networks Platforms – 15 yrs on IMC SAP customers represent 87% of Forbes Global 2,000 companies. SAP touches $16 trillion of world consumer purchases.
  • 4. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 SAP HANA – An In-Memory Platform to Enable New Business Scenarios Previously Not Feasible COSPCOEPCOBKBKPF BSEG BSEG BSEG BSIS BSIS BSIK BSET LFC1 GLT0 GLT0 GLT0 SAP Simple Finance 4 0 updatesinserts SAP Finance with aggregates and indices 10 5 no indices no aggregates no redundancies CORE DATA STRUCTURE REMAINS UNCHANGED • Soft financial close anytime • Real-time revenue and cost analysis • Real-time liquidity forecasts • Real-time alerts and blocks on suspicious transactions
  • 5. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 Distributed Big Data Is Everywhere How to better use it in core enterprise business applications? ~79% of Data Reservoirs/Lakes are still disconnected from core business operations How do I embed big data signal into my business applications and enterprise analytics? 53 Difficulty integrating with CRM and/or other systems % 49 Unable to apply or integrate external data quickly enough to inform real-time decision making % 59 Only a few analysts with specialized training can analyze big data % Harvard Business Review Analytic Services, Global Survey of 251 Respondents, Sept. 2015
  • 6. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 Introducing SAP HANA Vora An in-memory query engine that extends the Apache Spark execution framework to enrich the interactive analytics experiences on massively distributed computing clusters • OLAP processing • In-Memory Computing for high performance • Connecting to Enterprise Systems • Unified System Management SAP HANA ERP DATA BIG DATA Parallelized Queries Vora
  • 7. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 Key Open Source Contribution to Apache Spark Ecosystem Spark to HANA Push-downs & Data Hierarchies scala> val hierarchy = sqlContext.sql( s""" SELECT LVL, COUNT(*), ROUND( AVG(P_RETAILPRICE), 2) FROM ( SELECT LEVEL(node) AS LVL, P_RETAILPRICE FROM HIERARCHY( USING PART_HIERARCHY AS c JOIN PARENT p ON c.P_PARENT = p.P_PARTKEY SEARCH BY P_PARTKEY ASC START WHERE P_PARTKEY = 1 SET node ) AS H0 ) T1 GROUP BY LVL """.stripMargin ).collect().foreach(println) 90 1 90 3 91 3 91 2 90 4 91 1 +---+---+------------+ |LEVEL|COUNT|AVG(P_RETAILPRICE)| +-----+-----+------------------+ | 0 | 1 | 901 | | 1 | 2 | 903.5 | | 2 | 3 | 912 | +-----+-----+------------------+ val options = Map("dbschema" -> config.user,"host" -> config.host,"instance" -> config.instance) # HANA Live CustomerBasicData Virtual Data Model val custConf = options + ("path" -> s"""sap.hba.ecc/CustomerBasicData""") val cust = sqlContext.read.format("com.sap.spark.hana").options(custConf).load() cust.registerTempTable("customer") # HANA Live SalesOrderHeader VDM val sohConf = options + ("path" -> s"""sap.hba.ecc/SalesOrderHeader""") val soh = sqlContext.read.format("com.sap.spark.hana").options(sohConf).load() soh.registerTempTable(soh) # Top 5 Countries by Sales Order Volume salesOrder = sqlContext.sql("select "Country",count(*) as Frequency from salesOrder as s LEFT OUTER JOIN customer as c on s.soldToParty = c.Customer GROUP BY Country ORDER BY Frequency desc”)
  • 8. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 Airline Use Case – Optimize MRO scheduling with Sensor Data Challenges • $10,000 loss for every hour spent on maintenance, repair, and overhaul (MRO) • Predictive MRO generates TB of sensor data per flight Solution • SAP HANA Vora rapidly processes sensor data in HDFS and combines it with flight schedule and staffing data in SAP HANA to prioritize maintenance jobs and accelerate MRO Why SAP HANA Vora • Optimize MRO operations with interactive, on-demand drill down by airport, flight route, etc. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
  • 9. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 Utility Use Case – CenterPoint Energy Challenge • Smart meters generate TBs of data/month • Regulatory requirement to retain data for 10 years • Current storage solution full by end-2016 • Need to leverage HDFS as an additional tier for storage Solution • SAP HANA for most recent sensor signal and operational data, Dynamic Tiering for 1~2yrs old data, HDFS for historical sensor data • SAP HANA Vora accesses and queries data across all tiers Why SAP HANA Vora • SAP HANA Vora provides enterprise analytics & OLAP like experience across data warehouse and HDFS. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
  • 10. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 Utility Use Case – How It Works CenterPoint Energy Our benchmark tests proved that SAP HANA paired with SAP HANA Vora are the right solutions for us. We expect immediate cost benefits and to see competitive differentiation in the future.” Gary Hayes, CIO & SVP at CenterPoint Energy © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 SAP HANA MOST RECENT SENSOR DATA Dynamie Tiering 1-2 YR OLD DATA Parallelized Queries HDFS HISTORICAL SENSOR DATA Query data within and across tiers
  • 11. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 Financial Services Use Case – Extend Fraud Pattern Detection Challenges • 100+ million business transactions daily, 25% growth YoY • Limited access to archived data • Difficult to detect patterns in historical transactions Solution • Current transactions in SAP HANA, historical transactions in HDFS clusters • Real-time detection of abnormalities Why SAP HANA Vora • Real-time, aggregated insights from current and historical transactions © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16
  • 12. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 2016 and the Road Ahead Customers in North America, APJ, and EMEA Dev edition available on AWS TODAY General Availability Vora Modeler to build and query OLAP style cubes on data COMING SOON Planning (HR, Financial) Extend engine support for time series Transaction management Analytics on archived ERP data in Hadoop FUTURE
  • 13. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16© 2016 SAP SE or an SAP affiliate company. All rights reserved. Spark Summit New York, 2.17.16 Contribute to Spark Ecosystem, Embrace Best of Community Innovation Contribution to Open Source: Hierarchy capabilities Connection to ERP: predicate pushdown to HANA
  • 14. Thank you! Ken Tsai: ken.tsai@sap.com @kentsaiSAP Enter to Win a GoPro HERO4 Session at SAP Booth 102 Learn More @ hana.sap.com/vora Try Dev Edition bit.ly/1K1qLyo We’re Hiring: https://spark-summit.org/east-2016/jobs/