Big data tim

Today we measure available data in zettabytes

IN 2011, THE AMOUNT
OF DATA SURPASSED

1.8

90% OF THE DATA IN THE WORLD TODAY

has been created in the last two years
alone

ZETTABYTES
COMBINED GDP OF:

1.8

ZETTABYTES

=

57.5
BILLION
32 GB iPads

**IDC Digital Universe Study Extracting Value from Chaos
© 2013 SAP AG. All rights reserved.

=

$34.4

•
•
•
•

=

TRILLION

US
• France
Japan
• UK
China
• Italy
Germany

1

Confidential

1

Where is this data?
Types and Volumes of Data …

Traditional content types,
Including unstructured data,

…have grown dramatically

are growing by up to 80% per year

CRM Systems
M2M data
Transactions

Sales
Order

Mobile

ERP Systems

Instant Messages

Transactions

Planning

Email
Things

Sales Order

Things
Demand

Legacy EDW
© 2013 SAP AG. All rights reserved. 2013 SAP AG. All rights reserved.
©

Planning

Legacy ERP
Structured data grew by

Inventory

more than 40% per year

Mobile

Customer
2

2

What can’t we see?
WHAT CRITICAL “NEW SIGNALS”

MIGHT WE BE MISSING?
Is it in our ERP Systems?
Our M2M data?
Social?


Confidential

3

Big Data - Definition
“Big Data” refers to the problems of capturing,
storing, managing, and analyzing massive
amounts of various types of data
Big Data Challenge: turn raw data into insights that drive
business value and manage in a cost effective manner;

Most commonly this refers to terabytes or petabytes of data, stored
in multiple formats, from different internal and external sources, with
strict demands for speed and complexity of analysis


4

The SAP you need to know
System of Engagement
“Newer SAP”

SAP Cloud
Maintenance & Operations
24/7, SLA’s, DR & HA, Elasticity

mobile

System of Record

Business
Suite
(ERP)

Business
Analytics

“Foundational SAP”

Data Logistics/Quality ETL
In Memory Database Platform
In Memory / Columnar/ MPP/ Federation


Confidential

5

In Memory Database Platform

Digging Deeper

In Memory / Columnar/ MPP/ Federation

SAP Business Suite

Text

Core
PLM

OLAP

SRM

OLTP

SCM

ERP

Apps

CRM

Custom

Predictive

BI

HANA

SAP
BW

HTTP

Native

Apps

Geospatial

Models

Engines
Logical

memory

HOT

disk

WARM

cached

Bulk/Streaming/Real-time

User Interface
& Applications

COLD

Physical Table(s)
Virtual Tables
Ingest Engines

Federation

Data Logistics

(Data Services , SLT, CEP)
COLD

100101
011010
100101

Other
DB

Other
ERP

Other Data …
Confidential

6

Open Hadoop Strategy


Confidential

7

Accelerated BI with SAP BusinessObjects and SAP HANA
One unified and complete BI Suite addressing the full spectrum of BI on SAP HANA

Discovery and Analysis

Dashboards and Apps

Reporting

Discover. Predict. Create.

Build Engaging Experiences

Share Information

 Discover areas to optimize your
business

 Deliver engaging information to
users where they need it

 Securely distribute information
across your organization

 Adapt data to business needs

 Track key performance indicators
and summary data

 Give users the ability to ask and
answer their own questions

 Tell your story with beautiful
visualizations

 Build custom experiences so users
get what they need quickly

 Build printable reports for
operational efficiency


Confidential

8

Data Logistics
SAP Business
Suite

Trigger
Based, Real
Time

SAP LT
Replication
Server

SAP
BusinessObjects
tools

DB
Connection

SQL

ETL, Batch

SAP BW

Other query
tools

BICS

SQL

MDX

HANA Studio
ODBC
SAP BOBJ
Data Services

Log Based
Non SAP
Data Sources

SAP In-Memory Database
ECDA/ODBC

Sybase Replication
Server

In Memory
Models
Column Store

Event Streams
M2M

SAP Event Stream
Processor *

ODBC

SAP HANA

Data Sources

* SAP HANA Roadmap
** SAP ERP & BW Extractors
Confidential

9

SAP Big Data Apps

•

Customer Engagement
Intelligence

•

Predictive Analytics RDS

See overview https://community.wdf.sap.corp/docs/DOC-222087

Confidential

10

Delivering On Your Business Imperatives
Data Science Services
Forecasting Sales and Demand
 Forecast demand and managing
inventory levels in perishable CPG
 Model variant cannibalization and
impact on manufacturer forecasts
 Utility load demand forecasting

Check and Compliance
 Deliver faster response time and
higher throughput of compliance
checks to enable competitive
advantage
 Tackling public fraud waste and abuse
by analyzing records for tax discovery

Optimization

Performance and Insights

 Optimize transport and logistics recover from unforeseen disruptions

 Maximizing guest / customer
experience

 Optimize depth and timing of retail
markdowns to boost sales

 Assess the impact of promotions, and
improve profitability

 Grow deposits not excessive interest
costs

 Directional insight on growing
revenues and basket sizes

Contact “DL BigDataSalesSupport” for more information about SAP Data Science Services

Confidential

11

What is Hadoop
 Open source project inspired by Google/Yahoo
 Used at Yahoo, Facebook, eBay, LinkedIn, startups, Fortune 500
enterprises to store and process Petabytes of data on thousands of servers
 Hadoop components
– Cluster of commodity servers
– Distributed storage layer (Hadoop Distributed File System, or HDFS)
– Distributed processing infrastructure (MapReduce programming model)
Cluster of Commodity Servers

Hadoop




NameNode

10s to 1000s DataNode(s)


Hadoop Software Architecture

Hadoop
Computation Engines
Hive
HBase
Mahout
Pig

Sqoop

…

Map-Reduce

Data storage (Hadoop
Distributed File system)
Confidential

13

Apache Hadoop
Software framework for distributed data processing

 Hadoop Distributed File System
(HDFS) – reliable data storage on
commodity hardware

HDFS
Name Node

(stores metadata)
Data Node

Data Node

 HIVE -- data warehousing solution on
top of Hadoop with direct access to
HDFS and Hbase

(stores actual
data in blocks)

replication

(stores actual
data in blocks)

client

 MapReduce – programing model for
parallel data processing and query
execution


HDFS

Input

MapReduce

process

HDFS

output

Confidential

14

Why Hadoop?
Pros
 Free software

 Cheap hardware - commodity servers
 Scalable to thousands of nodes and petabytes of data
 Highly fault-tolerant storage and processing
 Flexible – write Java MapReduce programs to do any kind of processing; any
data- no fixed schema needed
 Open source libraries & tools
Cons
 Specialized skillset to administer and develop – Hadoop is not free!
 Require more development (programming MapReduce & other NoSQL tools)
than relational technologies (SQL, stored procedure)
 HIVE/PIG/Impala not as performant nor as mature as relational tech
 Batch-oriented jobs, not real-time
 Less mature in enterprise readiness – security, ETL, management, monitoring,
etc

Confidential

15

SAP HANA + Hadoop Provides Real-Time on BIG DATA
Combine INSTANT Results with INFINITE Storage

HADOOP

8

SAP HANA

1.0sec

Infinite storage

Instant Results
•

Modern in-memory
platform

•

Distributed disk
platform

•

Transact/analyze in
real-time

•

Store infinite amounts
of unstructured data

•

Native predictive, text,
and spatial algorithms

•

No-SQL access


Confidential

16

Big data tim

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Big data tim

Similaire à Big data tim (20)

Dernier

Dernier (20)

Big data tim

Notes de l'éditeur