SlideShare une entreprise Scribd logo
1  sur  16
Today we measure available data in zettabytes

IN 2011, THE AMOUNT
OF DATA SURPASSED

1.8

90% OF THE DATA IN THE WORLD TODAY

has been created in the last two years
alone

ZETTABYTES
COMBINED GDP OF:

1.8

ZETTABYTES

=

57.5
BILLION
32 GB iPads

**IDC Digital Universe Study Extracting Value from Chaos
© 2013 SAP AG. All rights reserved.

=

$34.4

•
•
•
•

=

TRILLION

US
• France
Japan
• UK
China
• Italy
Germany

1

Confidential

1
Where is this data?
Types and Volumes of Data …

Traditional content types,
Including unstructured data,

…have grown dramatically

are growing by up to 80% per year

CRM Systems
M2M data
Transactions

Sales
Order

Mobile

ERP Systems

Instant Messages

Transactions

Planning

Email
Things

Sales Order

Things
Demand

Legacy EDW
© 2013 SAP AG. All rights reserved. 2013 SAP AG. All rights reserved.
©

Planning

Legacy ERP
Structured data grew by

Inventory

more than 40% per year

Mobile

Customer
2

2
What can’t we see?
WHAT CRITICAL “NEW SIGNALS”

MIGHT WE BE MISSING?
Is it in our ERP Systems?
Our M2M data?
Social?

© 2013 SAP AG. All rights reserved.

Confidential

3
Big Data - Definition
“Big Data” refers to the problems of capturing,
storing, managing, and analyzing massive
amounts of various types of data
Big Data Challenge: turn raw data into insights that drive
business value and manage in a cost effective manner;

Most commonly this refers to terabytes or petabytes of data, stored
in multiple formats, from different internal and external sources, with
strict demands for speed and complexity of analysis

© 2013 SAP AG. All rights reserved.

4
The SAP you need to know
System of Engagement
“Newer SAP”

SAP Cloud
Maintenance & Operations
24/7, SLA’s, DR & HA, Elasticity

mobile

System of Record

Business
Suite
(ERP)

Business
Analytics

“Foundational SAP”

Data Logistics/Quality ETL
In Memory Database Platform
In Memory / Columnar/ MPP/ Federation

© 2013 SAP AG. All rights reserved.

Confidential

5
In Memory Database Platform

Digging Deeper

In Memory / Columnar/ MPP/ Federation

SAP Business Suite

Text

Core
PLM

OLAP

SRM

OLTP

SCM

ERP

Apps

CRM

Custom

Predictive

BI

HANA

SAP
BW

HTTP

Native

Apps

Geospatial

Models

Engines
Logical

memory

HOT

disk

WARM

cached

Bulk/Streaming/Real-time

User Interface
& Applications

COLD

Physical Table(s)
Virtual Tables
Ingest Engines

Federation

Data Logistics

(Data Services , SLT, CEP)
COLD

100101
011010
100101
© 2013 SAP AG. All rights reserved.

Other
DB

Other
ERP

Other Data …
Confidential

6
Open Hadoop Strategy

© 2013 SAP AG. All rights reserved.

Confidential

7
Accelerated BI with SAP BusinessObjects and SAP HANA
One unified and complete BI Suite addressing the full spectrum of BI on SAP HANA

Discovery and Analysis

Dashboards and Apps

Reporting

Discover. Predict. Create.

Build Engaging Experiences

Share Information

 Discover areas to optimize your
business

 Deliver engaging information to
users where they need it

 Securely distribute information
across your organization

 Adapt data to business needs

 Track key performance indicators
and summary data

 Give users the ability to ask and
answer their own questions

 Tell your story with beautiful
visualizations

 Build custom experiences so users
get what they need quickly

 Build printable reports for
operational efficiency

© 2013 SAP AG. All rights reserved.

Confidential

8
Data Logistics
SAP Business
Suite

Trigger
Based, Real
Time

SAP LT
Replication
Server

SAP
BusinessObjects
tools

DB
Connection

SQL

ETL, Batch

SAP BW

Other query
tools

BICS

SQL

MDX

HANA Studio
ODBC
SAP BOBJ
Data Services

Log Based
Non SAP
Data Sources

SAP In-Memory Database
ECDA/ODBC

Sybase Replication
Server

In Memory
Models
Column Store

Event Streams
M2M

SAP Event Stream
Processor *

ODBC

SAP HANA

Data Sources
© 2013 SAP AG. All rights reserved.

* SAP HANA Roadmap
** SAP ERP & BW Extractors
Confidential

9
SAP Big Data Apps

•

Customer Engagement
Intelligence

•

Predictive Analytics RDS

See overview https://community.wdf.sap.corp/docs/DOC-222087
© 2013 SAP AG. All rights reserved.

Confidential

10
Delivering On Your Business Imperatives
Data Science Services
Forecasting Sales and Demand
 Forecast demand and managing
inventory levels in perishable CPG
 Model variant cannibalization and
impact on manufacturer forecasts
 Utility load demand forecasting

Check and Compliance
 Deliver faster response time and
higher throughput of compliance
checks to enable competitive
advantage
 Tackling public fraud waste and abuse
by analyzing records for tax discovery

Optimization

Performance and Insights

 Optimize transport and logistics recover from unforeseen disruptions

 Maximizing guest / customer
experience

 Optimize depth and timing of retail
markdowns to boost sales

 Assess the impact of promotions, and
improve profitability

 Grow deposits not excessive interest
costs

 Directional insight on growing
revenues and basket sizes

Contact “DL BigDataSalesSupport” for more information about SAP Data Science Services
© 2013 SAP AG. All rights reserved.

Confidential

11
HANA + Hadoop
What is Hadoop
 Open source project inspired by Google/Yahoo
 Used at Yahoo, Facebook, eBay, LinkedIn, startups, Fortune 500
enterprises to store and process Petabytes of data on thousands of servers
 Hadoop components
– Cluster of commodity servers
– Distributed storage layer (Hadoop Distributed File System, or HDFS)
– Distributed processing infrastructure (MapReduce programming model)
Cluster of Commodity Servers

Hadoop




NameNode

10s to 1000s DataNode(s)

© 2013 SAP AG. All rights reserved.

Hadoop Software Architecture

Hadoop
Computation Engines
Hive
HBase
Mahout
Pig

Sqoop

…

Map-Reduce

Data storage (Hadoop
Distributed File system)
Confidential

13
Apache Hadoop
Software framework for distributed data processing

 Hadoop Distributed File System
(HDFS) – reliable data storage on
commodity hardware

HDFS
Name Node

(stores metadata)
Data Node

Data Node

 HIVE -- data warehousing solution on
top of Hadoop with direct access to
HDFS and Hbase

(stores actual
data in blocks)

replication

(stores actual
data in blocks)

client

 MapReduce – programing model for
parallel data processing and query
execution

© 2013 SAP AG. All rights reserved.

HDFS

Input

MapReduce

process

HDFS

output

Confidential

14
Why Hadoop?
Pros
 Free software

 Cheap hardware - commodity servers
 Scalable to thousands of nodes and petabytes of data
 Highly fault-tolerant storage and processing
 Flexible – write Java MapReduce programs to do any kind of processing; any
data- no fixed schema needed
 Open source libraries & tools
Cons
 Specialized skillset to administer and develop – Hadoop is not free!
 Require more development (programming MapReduce & other NoSQL tools)
than relational technologies (SQL, stored procedure)
 HIVE/PIG/Impala not as performant nor as mature as relational tech
 Batch-oriented jobs, not real-time
 Less mature in enterprise readiness – security, ETL, management, monitoring,
etc
© 2013 SAP AG. All rights reserved.

Confidential

15
SAP HANA + Hadoop Provides Real-Time on BIG DATA
Combine INSTANT Results with INFINITE Storage

HADOOP

8

SAP HANA

1.0sec

Infinite storage

Instant Results
•

Modern in-memory
platform

•

Distributed disk
platform

•

Transact/analyze in
real-time

•

Store infinite amounts
of unstructured data

•

Native predictive, text,
and spatial algorithms

•

No-SQL access

© 2013 SAP AG. All rights reserved.

Confidential

16

Contenu connexe

Tendances

Hadoop Big Data Resume
Hadoop Big Data ResumeHadoop Big Data Resume
Hadoop Big Data Resumearbind_jha
 
Amit Porwal_resume-Latest
Amit Porwal_resume-LatestAmit Porwal_resume-Latest
Amit Porwal_resume-LatestAmit Porwal
 
Hopper services
Hopper servicesHopper services
Hopper serviceshopperdev
 
SAP Staffing Practice
SAP Staffing PracticeSAP Staffing Practice
SAP Staffing Practiceguest5c9d51
 
Yumasoft An Outsourcing Software Development Services
Yumasoft An Outsourcing Software Development ServicesYumasoft An Outsourcing Software Development Services
Yumasoft An Outsourcing Software Development ServicesYuma Soft
 
TAG17 - O'Zapft is - Daten zapfen leicht gemacht?
TAG17 - O'Zapft is - Daten zapfen leicht gemacht?TAG17 - O'Zapft is - Daten zapfen leicht gemacht?
TAG17 - O'Zapft is - Daten zapfen leicht gemacht?SbgMartin
 
Oracle Staffing Practice
Oracle Staffing PracticeOracle Staffing Practice
Oracle Staffing Practiceguest5c9d51
 
Rick TongHyun An_ PM
Rick TongHyun An_ PMRick TongHyun An_ PM
Rick TongHyun An_ PMrickan
 
Rajendra kori it_project lead_9_cv
Rajendra kori it_project lead_9_cvRajendra kori it_project lead_9_cv
Rajendra kori it_project lead_9_cvRajendra Kori
 
Sharique Khan Resume
Sharique Khan ResumeSharique Khan Resume
Sharique Khan ResumeSharique Khan
 

Tendances (20)

Hadoop Big Data Resume
Hadoop Big Data ResumeHadoop Big Data Resume
Hadoop Big Data Resume
 
Amit Porwal_resume-Latest
Amit Porwal_resume-LatestAmit Porwal_resume-Latest
Amit Porwal_resume-Latest
 
Wael Abdeen Resume
Wael Abdeen ResumeWael Abdeen Resume
Wael Abdeen Resume
 
Hopper services
Hopper servicesHopper services
Hopper services
 
Sap
SapSap
Sap
 
SAP PI online training course content
SAP PI online training course contentSAP PI online training course content
SAP PI online training course content
 
Leonard CV 2016 june
Leonard CV  2016 juneLeonard CV  2016 june
Leonard CV 2016 june
 
Soundarya Reddy Resume
Soundarya Reddy ResumeSoundarya Reddy Resume
Soundarya Reddy Resume
 
SAP Staffing Practice
SAP Staffing PracticeSAP Staffing Practice
SAP Staffing Practice
 
Yumasoft An Outsourcing Software Development Services
Yumasoft An Outsourcing Software Development ServicesYumasoft An Outsourcing Software Development Services
Yumasoft An Outsourcing Software Development Services
 
TAG17 - O'Zapft is - Daten zapfen leicht gemacht?
TAG17 - O'Zapft is - Daten zapfen leicht gemacht?TAG17 - O'Zapft is - Daten zapfen leicht gemacht?
TAG17 - O'Zapft is - Daten zapfen leicht gemacht?
 
Oracle Staffing Practice
Oracle Staffing PracticeOracle Staffing Practice
Oracle Staffing Practice
 
Supriya Pandeti Resume
Supriya Pandeti ResumeSupriya Pandeti Resume
Supriya Pandeti Resume
 
Rick TongHyun An_ PM
Rick TongHyun An_ PMRick TongHyun An_ PM
Rick TongHyun An_ PM
 
Rajendra kori it_project lead_9_cv
Rajendra kori it_project lead_9_cvRajendra kori it_project lead_9_cv
Rajendra kori it_project lead_9_cv
 
Sharique Khan Resume
Sharique Khan ResumeSharique Khan Resume
Sharique Khan Resume
 
Ojas it services
Ojas it servicesOjas it services
Ojas it services
 
Resume_Feb_2016
Resume_Feb_2016Resume_Feb_2016
Resume_Feb_2016
 
Resume-NuwanAmarasighe - NP
Resume-NuwanAmarasighe - NPResume-NuwanAmarasighe - NP
Resume-NuwanAmarasighe - NP
 
Alex Bagdonas Resume 20160313
Alex Bagdonas Resume 20160313Alex Bagdonas Resume 20160313
Alex Bagdonas Resume 20160313
 

Similaire à Big data tim

GITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP PresentationGITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP PresentationPedro Pereira
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Business intelligence in the era of big data
Business intelligence in the era of big dataBusiness intelligence in the era of big data
Business intelligence in the era of big dataJC Raveneau
 
In-Memory Database Platform for Big Data
In-Memory Database Platform for Big DataIn-Memory Database Platform for Big Data
In-Memory Database Platform for Big DataSAP Technology
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Harnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeHarnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeDataWorks Summit
 
The SAP Startup Focus Program – Tackling Big Data With the Power of Small by ...
The SAP Startup Focus Program – Tackling Big Data With the Power of Small by ...The SAP Startup Focus Program – Tackling Big Data With the Power of Small by ...
The SAP Startup Focus Program – Tackling Big Data With the Power of Small by ...Codemotion
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Datajdijcks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJDaniel Madrigal
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...In-Memory Computing Summit
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
TDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWTDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWukc4
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
What's New with SAP BusinessObjects Business Intelligence 4.1?
What's New with SAP BusinessObjects Business Intelligence 4.1?What's New with SAP BusinessObjects Business Intelligence 4.1?
What's New with SAP BusinessObjects Business Intelligence 4.1?SAP Analytics
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
SAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementSAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementDobler Consulting
 

Similaire à Big data tim (20)

GITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP PresentationGITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP Presentation
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Business intelligence in the era of big data
Business intelligence in the era of big dataBusiness intelligence in the era of big data
Business intelligence in the era of big data
 
In-Memory Database Platform for Big Data
In-Memory Database Platform for Big DataIn-Memory Database Platform for Big Data
In-Memory Database Platform for Big Data
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Harnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeHarnessing Big Data in Real-Time
Harnessing Big Data in Real-Time
 
The SAP Startup Focus Program – Tackling Big Data With the Power of Small by ...
The SAP Startup Focus Program – Tackling Big Data With the Power of Small by ...The SAP Startup Focus Program – Tackling Big Data With the Power of Small by ...
The SAP Startup Focus Program – Tackling Big Data With the Power of Small by ...
 
Haven 2 0
Haven 2 0 Haven 2 0
Haven 2 0
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
TDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWTDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDW
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
What's New with SAP BusinessObjects Business Intelligence 4.1?
What's New with SAP BusinessObjects Business Intelligence 4.1?What's New with SAP BusinessObjects Business Intelligence 4.1?
What's New with SAP BusinessObjects Business Intelligence 4.1?
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
SAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementSAP IQ 16 Product Annoucement
SAP IQ 16 Product Annoucement
 

Dernier

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Dernier (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Big data tim

  • 1. Today we measure available data in zettabytes IN 2011, THE AMOUNT OF DATA SURPASSED 1.8 90% OF THE DATA IN THE WORLD TODAY has been created in the last two years alone ZETTABYTES COMBINED GDP OF: 1.8 ZETTABYTES = 57.5 BILLION 32 GB iPads **IDC Digital Universe Study Extracting Value from Chaos © 2013 SAP AG. All rights reserved. = $34.4 • • • • = TRILLION US • France Japan • UK China • Italy Germany 1 Confidential 1
  • 2. Where is this data? Types and Volumes of Data … Traditional content types, Including unstructured data, …have grown dramatically are growing by up to 80% per year CRM Systems M2M data Transactions Sales Order Mobile ERP Systems Instant Messages Transactions Planning Email Things Sales Order Things Demand Legacy EDW © 2013 SAP AG. All rights reserved. 2013 SAP AG. All rights reserved. © Planning Legacy ERP Structured data grew by Inventory more than 40% per year Mobile Customer 2 2
  • 3. What can’t we see? WHAT CRITICAL “NEW SIGNALS” MIGHT WE BE MISSING? Is it in our ERP Systems? Our M2M data? Social? © 2013 SAP AG. All rights reserved. Confidential 3
  • 4. Big Data - Definition “Big Data” refers to the problems of capturing, storing, managing, and analyzing massive amounts of various types of data Big Data Challenge: turn raw data into insights that drive business value and manage in a cost effective manner; Most commonly this refers to terabytes or petabytes of data, stored in multiple formats, from different internal and external sources, with strict demands for speed and complexity of analysis © 2013 SAP AG. All rights reserved. 4
  • 5. The SAP you need to know System of Engagement “Newer SAP” SAP Cloud Maintenance & Operations 24/7, SLA’s, DR & HA, Elasticity mobile System of Record Business Suite (ERP) Business Analytics “Foundational SAP” Data Logistics/Quality ETL In Memory Database Platform In Memory / Columnar/ MPP/ Federation © 2013 SAP AG. All rights reserved. Confidential 5
  • 6. In Memory Database Platform Digging Deeper In Memory / Columnar/ MPP/ Federation SAP Business Suite Text Core PLM OLAP SRM OLTP SCM ERP Apps CRM Custom Predictive BI HANA SAP BW HTTP Native Apps Geospatial Models Engines Logical memory HOT disk WARM cached Bulk/Streaming/Real-time User Interface & Applications COLD Physical Table(s) Virtual Tables Ingest Engines Federation Data Logistics (Data Services , SLT, CEP) COLD 100101 011010 100101 © 2013 SAP AG. All rights reserved. Other DB Other ERP Other Data … Confidential 6
  • 7. Open Hadoop Strategy © 2013 SAP AG. All rights reserved. Confidential 7
  • 8. Accelerated BI with SAP BusinessObjects and SAP HANA One unified and complete BI Suite addressing the full spectrum of BI on SAP HANA Discovery and Analysis Dashboards and Apps Reporting Discover. Predict. Create. Build Engaging Experiences Share Information  Discover areas to optimize your business  Deliver engaging information to users where they need it  Securely distribute information across your organization  Adapt data to business needs  Track key performance indicators and summary data  Give users the ability to ask and answer their own questions  Tell your story with beautiful visualizations  Build custom experiences so users get what they need quickly  Build printable reports for operational efficiency © 2013 SAP AG. All rights reserved. Confidential 8
  • 9. Data Logistics SAP Business Suite Trigger Based, Real Time SAP LT Replication Server SAP BusinessObjects tools DB Connection SQL ETL, Batch SAP BW Other query tools BICS SQL MDX HANA Studio ODBC SAP BOBJ Data Services Log Based Non SAP Data Sources SAP In-Memory Database ECDA/ODBC Sybase Replication Server In Memory Models Column Store Event Streams M2M SAP Event Stream Processor * ODBC SAP HANA Data Sources © 2013 SAP AG. All rights reserved. * SAP HANA Roadmap ** SAP ERP & BW Extractors Confidential 9
  • 10. SAP Big Data Apps • Customer Engagement Intelligence • Predictive Analytics RDS See overview https://community.wdf.sap.corp/docs/DOC-222087 © 2013 SAP AG. All rights reserved. Confidential 10
  • 11. Delivering On Your Business Imperatives Data Science Services Forecasting Sales and Demand  Forecast demand and managing inventory levels in perishable CPG  Model variant cannibalization and impact on manufacturer forecasts  Utility load demand forecasting Check and Compliance  Deliver faster response time and higher throughput of compliance checks to enable competitive advantage  Tackling public fraud waste and abuse by analyzing records for tax discovery Optimization Performance and Insights  Optimize transport and logistics recover from unforeseen disruptions  Maximizing guest / customer experience  Optimize depth and timing of retail markdowns to boost sales  Assess the impact of promotions, and improve profitability  Grow deposits not excessive interest costs  Directional insight on growing revenues and basket sizes Contact “DL BigDataSalesSupport” for more information about SAP Data Science Services © 2013 SAP AG. All rights reserved. Confidential 11
  • 13. What is Hadoop  Open source project inspired by Google/Yahoo  Used at Yahoo, Facebook, eBay, LinkedIn, startups, Fortune 500 enterprises to store and process Petabytes of data on thousands of servers  Hadoop components – Cluster of commodity servers – Distributed storage layer (Hadoop Distributed File System, or HDFS) – Distributed processing infrastructure (MapReduce programming model) Cluster of Commodity Servers Hadoop    NameNode 10s to 1000s DataNode(s) © 2013 SAP AG. All rights reserved. Hadoop Software Architecture Hadoop Computation Engines Hive HBase Mahout Pig Sqoop … Map-Reduce Data storage (Hadoop Distributed File system) Confidential 13
  • 14. Apache Hadoop Software framework for distributed data processing  Hadoop Distributed File System (HDFS) – reliable data storage on commodity hardware HDFS Name Node (stores metadata) Data Node Data Node  HIVE -- data warehousing solution on top of Hadoop with direct access to HDFS and Hbase (stores actual data in blocks) replication (stores actual data in blocks) client  MapReduce – programing model for parallel data processing and query execution © 2013 SAP AG. All rights reserved. HDFS Input MapReduce process HDFS output Confidential 14
  • 15. Why Hadoop? Pros  Free software  Cheap hardware - commodity servers  Scalable to thousands of nodes and petabytes of data  Highly fault-tolerant storage and processing  Flexible – write Java MapReduce programs to do any kind of processing; any data- no fixed schema needed  Open source libraries & tools Cons  Specialized skillset to administer and develop – Hadoop is not free!  Require more development (programming MapReduce & other NoSQL tools) than relational technologies (SQL, stored procedure)  HIVE/PIG/Impala not as performant nor as mature as relational tech  Batch-oriented jobs, not real-time  Less mature in enterprise readiness – security, ETL, management, monitoring, etc © 2013 SAP AG. All rights reserved. Confidential 15
  • 16. SAP HANA + Hadoop Provides Real-Time on BIG DATA Combine INSTANT Results with INFINITE Storage HADOOP 8 SAP HANA 1.0sec Infinite storage Instant Results • Modern in-memory platform • Distributed disk platform • Transact/analyze in real-time • Store infinite amounts of unstructured data • Native predictive, text, and spatial algorithms • No-SQL access © 2013 SAP AG. All rights reserved. Confidential 16

Notes de l'éditeur

  1. Big Data has existed for a while and is far more than just massive volumes of data from novel sources. As customers note it is about how you manage, analyze and use and combine the data. SAP makes Big Data Real.what big data is not –e.g. SOH, BWoH.What would make them big data –e.g. sales forecasting on soh/bwoh; mashing up corporate data with social media data or with call center notes.
  2. What I say: HANA is tremendously important to SAP’s vision. We are no longer the “ERP company”, as you may think. Following the acquisitions of Business Objects, Sybase and recently SuccessFactors, SAP now leads several important enterprise business categories. Furthermore, by investing in in-house innovation, we have now assembled a vertically integrated business data management stack, all the way from data management appliances to applications to on-demand application services, providing increased customer value. And HANA is at the heart of this strategy!
  3. SAP big data platform: Open StrategyWe are planning on SAP formally reselling a HANA+ Hadoop bundle on SAP’s price list. vendors HortonWorks, Intel, Mappr, cloudera, Hadoop distribution are in every account and we work with them allStrategic announcement coming
  4. SAP offers the applications and analytic tools that help you to infuse Big Data insights directly into your business processes, and equipping your employees, partners and customers with access to data in order to uncover and monetize insights
  5. Big Data is a big opportunity for most companies and is something that they must embrace to competitive. It has existed for a while and is far more than just massive volumes of data from novel sources. As customers note it is about how you manage, analyze and use and combine the data. SAP makes Big Data Real.
  6. Hadoop runs on the Hadoop Distributed File System (HDFS), a distributed file system that scales out on commodity servers. Since Hadoop is file-based, developers don’t need to create a data model to store or process data, which makes Hadoop ideal for managing semi-structured Web data, which comes in many shapes and sizes. Because it is “schema-less,” Hadoop can be used to store and process any kind of data, including structured transactional data and unstructured audio and video data. However, the biggest advantage of Hadoop is that is open source, which means that the up-front costs of implementing a system to process large volumes of data are lower than for commercial systems. However, Hadoop does require companies to purchase and manage dozens, if not hundreds, of servers and train developers and administrators to use this new technology.Apache Hadoop enables applications to work with thousands of independent computers (nodes) which are collectively referred to as a cluster (if all nodes use the same hardware) or a grid (if the nodes use different hardware). The main components used in Hadoop to run a job include: Client: submits the MapReduce job.Jobtracker: coordinates the job run. The jobtracker is a Java application whose main class is JobTracker.Task trackers: Run the tasks that the job has been split into. Tasktrackers are Java applications whose main class is TaskTracker.Hadoop distributed file system (HDSF)HDFS is file system that sits on top of native file systemDifferent blocks of a file stored in different nodesName node keeps tracks of which blocks are make up a file and where those blocks are locatedAuto rebalances, auto replicationsUniform namespaceMapReduceHadoopMapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. Computational processing can occur on data stored either in a file system (unstructured) or in a database (structured).Map: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-leve tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.Reduce: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.MapReduce allows for distributed processing of the map and reduction operations. Since each mapping operation is independent of the others, all maps can be performed in parallel – though in practice it is limited by the number of independent data sources and/or the number of CPUs near each source. Similarly, a set of 'reducers' can perform the reduction phase - provided all outputs of the map operation that share the same key are presented to the same reducer at the same time. While this process can often appear inefficient compared to algorithms that are more sequential, MapReduce can be applied to significantly larger datasets than "commodity" servers can handle – a large server farm can use MapReduce to sort a petabyte of data in only a few hours. The parallelism also offers some possibility of recovering from partial failure of servers or storage during the operation: if one mapper or reducer fails, the work can be rescheduled – assuming the input data is still available.
  7. SAP HANAIn-memory platformStore billions of recordsAnalyze in real-timeBuilt-in predictive, text, and spatial algorithmsHADOOPDistributed disk platformStore infinite amounts of unstructured data Search in batchNon-relational data storeSpecialized skills to implement and codeMany add-on libraries and packages