Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
The Journey to Becoming a Data-
Driven Enterprise
Pivotal Big Data Roadshow 2015
2© Copyright 2015 Pivotal. All rights reserved.
Where we’re going today…
3 Great Keynotes
• Journey to a Data-driven Enter...
3© Copyright 2015 Pivotal. All rights reserved.
Today’s Agenda
12:00 PM – 12:45 PM - Check-In & Lunch
12:45 PM – 1:00 PM –...
© Copyright 2015 Pivotal. All rights reserved.
MASHING BIG DATA WITH BIG MACHINES
IS ‘BEAUTIFUL, DESIRABLE, INVESTABLE’
- ...
© Copyright 2015 Pivotal. All rights reserved.
THE POWER OF 1
R
X
Increasing
Freight Utilization Rail
Predictive
Maintenan...
© Copyright 2015 Pivotal. All rights reserved.
DATA-DRIVEN ENTERPRISE JOURNEY
STORE
• Structured
• Unstructured
• High Vol...
8© Copyright 2015 Pivotal. All rights reserved.
0% of CIOs think
their IT infrastructure
is fully prepared for
big data (3...
9© Copyright 2015 Pivotal. All rights reserved.
BIG DATA
CHASM
70%
of data
generated by
customers
80%
of data stored
3%
pr...
10© Copyright 2015 Pivotal. All rights reserved.
Software Is Eating The World
Data Is Fueling Software
SOFTWARE IS EATING ...
11© Copyright 2015 Pivotal. All rights reserved.
WE CHOSE PIVOTAL BECAUSE WE BELIEVE IT
PROVIDES A 360-DEGREE VIEW OF THE ...
12© Copyright 2015 Pivotal. All rights reserved.
ACROSS INDUSTRIES
13© Copyright 2015 Pivotal. All rights reserved.
THE NEW DATA IMPERATIVES
Converged
Data & Cloud
OpenData-Driven
Apps
14© Copyright 2015 Pivotal. All rights reserved.
THE BIG DATA PROBLEM
Fragmentation ConstraintsComplexity
15© Copyright 2015 Pivotal. All rights reserved.
• Remove Lock-in
• Leverage Ecosystem
• Co-innovate
GUIDING PRINCIPLES IN...
16© Copyright 2015 Pivotal. All rights reserved.
JOURNEY TO A DATA-DRIVEN ENTERPRISE
Deploy analytic apps and
automate at ...
17© Copyright 2015 Pivotal. All rights reserved.
Deploy analytic apps and
automate at scale
Perform advanced analytics
Dis...
18© Copyright 2015 Pivotal. All rights reserved.
MODERNIZE DATA INFRASTRUCTURE
Elastic, Scale-out
storage and processing
F...
19© Copyright 2015 Pivotal. All rights reserved.
Modernize data
infrastructure
Deploy analytic apps and
automate at scale
...
20© Copyright 2015 Pivotal. All rights reserved.
ADVANCED ANALYTICS
Leverage existing skills and tools
Rapid time to insig...
21© Copyright 2015 Pivotal. All rights reserved.
Modernize data
infrastructure
Perform advanced analytics
Discover insight...
22© Copyright 2015 Pivotal. All rights reserved.
ANALYTIC APPS AND AUTOMATION AT SCALE
Reduced time to action
Low ‘analyti...
23© Copyright 2015 Pivotal. All rights reserved.
JOURNEY TO A DATA-DRIVEN ENTERPRISE
Deploy analytic apps and
automate at ...
26© Copyright 2015 Pivotal. All rights reserved.
PIVOTAL BIG DATA SUITE
27© Copyright 2015 Pivotal. All rights reserved.
Open sourcing all Pivotal Big Data Suite components including:
WORLD’S FI...
28© Copyright 2015 Pivotal. All rights reserved.
BUILT FOR ENTERPRISES
Value added features: enterprise grade performance ...
29© Copyright 2015 Pivotal. All rights reserved.
• Common core for Hadoop ecosystem
• Rapidly accelerated certifications, ...
30© Copyright 2015 Pivotal. All rights reserved.
AGILE
Deploy analytic apps and
automate at scale
Perform advanced analyti...
31© Copyright 2015 Pivotal. All rights reserved.
CLOUD-READY
COMMODITY
HARDWARE
APPLIANCE HYBRID CLOUDCLOUD
IaaS IaaS
PAAS
32© Copyright 2015 Pivotal. All rights reserved.
DATA-DRIVEN ENTERRPRISE JOURNEY WITH
PIVOTAL BIG DATA SUITE
STORE
• Struc...
35© Copyright 2015 Pivotal. All rights reserved.
FOR FURTHER INFO, CHECKOUT…
• Pivotal Data Product Info, Docs and Downloa...
36© Copyright 2015 Pivotal. All rights reserved. 36© Copyright 2013 Pivotal. All rights reserved.
Pivotal Data
Science Ove...
37© Copyright 2015 Pivotal. All rights reserved.
DATA SCIENCE?
App Development
Analytics
Business Intelligence
Reporting
V...
38© Copyright 2015 Pivotal. All rights reserved.
• ETL
• Unstructured
• Data Cleansing
• Sensors
Data Related
• Algorithms...
39© Copyright 2015 Pivotal. All rights reserved.
What is Data Science?
The use of statistical and machine learning techniq...
40© Copyright 2015 Pivotal. All rights reserved.
Gene Sequencing
Smart Grids
COST TO SEQUENCE
ONE GENOME
HAS FALLEN FROM
$...
41© Copyright 2015 Pivotal. All rights reserved.
What is Big Data Analytics?
Descriptive
Analytics
WHAT HAPPENED?
Diagnost...
42© Copyright 2015 Pivotal. All rights reserved.
P L A T F O R M
Data Science Toolkit
KEY TOOLS KEY LANGUAGES
SQL
43© Copyright 2015 Pivotal. All rights reserved.
Scalable, In-Database ML
• Open Source https://github.com/madlib/madlib
•...
44© Copyright 2015 Pivotal. All rights reserved.
Functions
Supervised Learning
Regression Models
• Cox Proportional Hazard...
45© Copyright 2015 Pivotal. All rights reserved.
A single address for everything analytics
Analytics with Pivotal
Time-to-...
46© Copyright 2015 Pivotal. All rights reserved.
Smart Systems = Sensors + Digital Brain + Actuators
Problem
Formulation
M...
47© Copyright 2015 Pivotal. All rights reserved. 47© Copyright 2013 Pivotal. All rights reserved.
Data Science
Use Cases
48© Copyright 2015 Pivotal. All rights reserved. 48© Copyright 2013 Pivotal. All rights reserved.
Financial Services
49© Copyright 2015 Pivotal. All rights reserved.
Identifying and Pricing Cross-Sell Opportunities
CUSTOMER
A global financ...
50© Copyright 2015 Pivotal. All rights reserved.
Financial Compliance
BUSINESS PROBLEM
• Ensure compliance with Dodd-Frank...
51© Copyright 2015 Pivotal. All rights reserved. 51© Copyright 2013 Pivotal. All rights reserved.
Telco & Mobile
52© Copyright 2015 Pivotal. All rights reserved.
Subscriber Micro-Segmentation
CUSTOMER
A major telco with cable & VOD, in...
53© Copyright 2015 Pivotal. All rights reserved.
Newly Identified Behavior-Based SegmentsSubscribers
Moderates
OTT & Data ...
54© Copyright 2015 Pivotal. All rights reserved.
Opportunities for
Data-Driven Decisions
in Pharma
55© Copyright 2015 Pivotal. All rights reserved.
Data driven drugs: From discovery to delivery
RICH DATA SOURCES
• Molecul...
56© Copyright 2015 Pivotal. All rights reserved.
A pipeline of sensors and opportunities for optimizing output
Internet of...
57© Copyright 2015 Pivotal. All rights reserved.
Vaccine Potency Prediction
CUSTOMER
A major pharmaceutical company
BUSINE...
58© Copyright 2015 Pivotal. All rights reserved.
http://blog.pivotal.io/data-science-pivotal
Check out the Pivotal Data Sc...
59© Copyright 2015 Pivotal. All rights reserved.
FOR FURTHER INFO…
• Pivotal Data Product Info, Docs and Downloads @ http:...
60© Copyright 2015 Pivotal. All rights reserved.
Pivotal Data Science Labs: Packaged Services
• Analytics
Roadmap
• Priori...
61© Copyright 2015 Pivotal. All rights reserved.
Data Streaming and
Predictive Analytics
Using Pivotal Big Data Suite
62© Copyright 2015 Pivotal. All rights reserved.
Converging
Trends
Innovation
New Data New
Processes
New Insights
The Jour...
63© Copyright 2015 Pivotal. All rights reserved.
HDFS
Data Lake
Ingest Store Analytics
Hard to change
Labor intensive
Inef...
64© Copyright 2015 Pivotal. All rights reserved.
HDFSData Lake
Expert System /
Machine Learning
In-Memory Real-
Time Data
...
65© Copyright 2015 Pivotal. All rights reserved.
New York Times Research: http://www.nytimes.com/2014/08/18/technology/for...
66© Copyright 2015 Pivotal. All rights reserved.
Data Feeds
Stream Processing
Expert Systems
Machine Learning
Historical D...
67© Copyright 2015 Pivotal. All rights reserved.
Ingest Transform Sink
SpringXD
GemFire
Data Stream Needs an Agile, Scalab...
68© Copyright 2015 Pivotal. All rights reserved.
Ingest Transform Sink
SpringXD
Distributed
Computing
In-Memory
Real-Time ...
69© Copyright 2015 Pivotal. All rights reserved.
INGEST / SINK PROCESS ANALYZE
• No coding required
• Dozens of built-in
c...
70© Copyright 2015 Pivotal. All rights reserved.
Ingest Transform Sink
SpringXD
Distributed
Computing
GemFire Provides Sca...
71© Copyright 2015 Pivotal. All rights reserved.
GemFire
• In-Memory Enterprise Data Grid
• Horizontally Scalable, Consist...
72© Copyright 2015 Pivotal. All rights reserved.
Ingest Transform Sink
SpringXD
Distributed
Computing
Pivotal Provides SQL...
73© Copyright 2015 Pivotal. All rights reserved.
HAWQ
• Massively Parallel Processing
RDBMS on HADOOP
• ANSI SQL on Hadoop...
74© Copyright 2015 Pivotal. All rights reserved.
Ingest Transform Sink
SpringXD
Developers and Data Scientists Can Focus o...
75© Copyright 2015 Pivotal. All rights reserved.
Data Streaming Reference Architecture
Data Feeds Transactional Apps Analy...
76© Copyright 2015 Pivotal. All rights reserved.
Data Streaming Reference Architecture
Data Feeds Transactional Apps Analy...
77© Copyright 2015 Pivotal. All rights reserved.
“
SO WE ARE MOVING TO A WORLD WHERE THE
MACHINES WE WORK WITH ARE NOT JUS...
78© Copyright 2015 Pivotal. All rights reserved.
Demo
Powered by Pivotal Big Data Suite
79© Copyright 2015 Pivotal. All rights reserved.
It's all about DATA
Data Sources
Look for patterns
Prediction
Transform Sink
SpringXD
Extensible
Open-Source
Fault-Tolerant
Horizontally Scalable
Cloud-Native
Machine Learning
Enrich F...
81© Copyright 2015 Pivotal. All rights reserved.
91© Copyright 2015 Pivotal. All rights reserved.
“
THE REAL OPPORTUNITY FOR
CHANGE...SURPASSING THE MAGNITUDE OF THE
CONSU...
100© Copyright 2015 Pivotal. All rights reserved.
FOR FURTHER INFO, CHECKOUT…
• Pivotal Data Product Info, Docs and Downlo...
BUILT FOR THE SPEED OF BUSINESS
102© Copyright 2015 Pivotal. All rights reserved. 102© Copyright 2013 Pivotal. All rights reserved.
Accelerating the Gener...
BUILT FOR THE SPEED OF BUSINESS
104© Copyright 2015 Pivotal. All rights reserved.
Gene Sequencing
Smart Grids
COST TO SEQUENCE
ONE GENOME
HAS FALLEN FROM
...
105© Copyright 2015 Pivotal. All rights reserved.
What is Big Data Analytics?
Descriptive
Analytics
WHAT HAPPENED?
Diagnos...
106© Copyright 2015 Pivotal. All rights reserved.
Opportunities for
Data-Driven Decisions
in Pharma
107© Copyright 2015 Pivotal. All rights reserved.
Data driven drugs: From discovery to delivery
RICH DATA SOURCES
• Molecu...
108© Copyright 2015 Pivotal. All rights reserved.
A pipeline of sensors and opportunities for optimizing output
Internet o...
109© Copyright 2015 Pivotal. All rights reserved.
Vaccine Potency Prediction
CUSTOMER
A major pharmaceutical company
BUSIN...
110© Copyright 2015 Pivotal. All rights reserved.
Interpreting the utility of a measure obtained during manufacturing base...
111© Copyright 2015 Pivotal. All rights reserved.
Need for new environments to process big data?
HDFS STORAGE AND MPP
ARCH...
112© Copyright 2015 Pivotal. All rights reserved.
Multiple tools with a single, simple goal: Distributed
storage with in-p...
113© Copyright 2015 Pivotal. All rights reserved.
Multiple tools with a single, simple goal: Distributed
storage with in-p...
114© Copyright 2015 Pivotal. All rights reserved.
Identifying duplicates: counting with grouping
Opportunities for
perform...
115© Copyright 2015 Pivotal. All rights reserved.
Identifying duplicates: counting with grouping
Reference genome
Mapped r...
116© Copyright 2015 Pivotal. All rights reserved.
Identifying duplicates: counting with grouping
Duplicates
1
3
1
1
1
1
5
...
117© Copyright 2015 Pivotal. All rights reserved.
Reference genome
Mapped reads
Counting numbers of reads mapped to featur...
118© Copyright 2015 Pivotal. All rights reserved.
Multiple tools with a single, simple goal: Distributed
storage with in-p...
119© Copyright 2015 Pivotal. All rights reserved.
Multiple tools with a single, simple goal: Distributed
storage with in-p...
120© Copyright 2015 Pivotal. All rights reserved.
A single address for everything analytics
Analytics with Pivotal
Time-to...
121© Copyright 2015 Pivotal. All rights reserved.
P L A T F O R M
Data Science Toolkit
KEY TOOLS KEY LANGUAGES
SQL
122© Copyright 2015 Pivotal. All rights reserved.
Historically data was studied in silos
BRCA dataset
Treatments
Protein A...
123© Copyright 2015 Pivotal. All rights reserved.
Genomics
Data Center
Researcher
Computing
ClusterUnnecessary data
moveme...
124© Copyright 2015 Pivotal. All rights reserved.
In-database genome-wide association study
Network
Interconnect
Master
Se...
125© Copyright 2015 Pivotal. All rights reserved.
In-database genome-wide association study
Network
Interconnect
Master
Se...
126© Copyright 2015 Pivotal. All rights reserved.
In-database genome-wide association study
Network
Interconnect
Master
Se...
127© Copyright 2015 Pivotal. All rights reserved.
In-database genome-wide association study
Network
Interconnect
Master
Se...
128© Copyright 2015 Pivotal. All rights reserved.
In-database genome-wide association study
Network
Interconnect
Master
Se...
129© Copyright 2015 Pivotal. All rights reserved.
In-database genome-wide association study
Network
Interconnect
Master
Se...
130© Copyright 2015 Pivotal. All rights reserved.
In-database genome-wide association study
Network
Interconnect
Master
Se...
131© Copyright 2015 Pivotal. All rights reserved.
Procedural Languages in Big Data Science
 HAWQ & PL/X can take advantag...
132© Copyright 2015 Pivotal. All rights reserved.
Finding Causal Variants in Lupus
Customer
Biotech Company
Business Probl...
133© Copyright 2015 Pivotal. All rights reserved.
Processing images and building integrated
models at scale
134© Copyright 2015 Pivotal. All rights reserved.
Image Computation Framework
Hadoop
Sequence
File
Thousands of
Images
Ima...
135© Copyright 2015 Pivotal. All rights reserved.
Image Computation Framework
Hadoop
Sequence
File
Thousands of
Images
Ima...
136© Copyright 2015 Pivotal. All rights reserved.
Image Computation Framework
Hadoop
Sequence
File
Thousands of
Images
One...
137© Copyright 2015 Pivotal. All rights reserved.
Image Computation Framework
Hadoop
Sequence
File
Thousands of
Images
One...
138© Copyright 2015 Pivotal. All rights reserved.
Representing an image in HAWQ
HAWQ enables rapid processing of multiple ...
139© Copyright 2015 Pivotal. All rights reserved.
Translating image processing to simple SQL
Function Distribution of pixe...
140© Copyright 2015 Pivotal. All rights reserved.
Image Processing Pipeline
For Object Counting
Original
Image name # Cell...
141© Copyright 2015 Pivotal. All rights reserved.
Image Computation Framework
Hadoop
Sequence
File
Thousands of
Images
One...
142© Copyright 2015 Pivotal. All rights reserved.
A Drug-Centric Data Lake to Enable Drug Discovery
Customer
A major pharm...
143© Copyright 2015 Pivotal. All rights reserved.
http://blog.pivotal.io/data-science-pivotal
Check out the Pivotal Data S...
144© Copyright 2015 Pivotal. All rights reserved.
FOR FURTHER INFO, CHECKOUT…
• Pivotal Blog @ http://blog.pivotal.io
• Pi...
145© Copyright 2015 Pivotal. All rights reserved. 145© Copyright 2013 Pivotal. All rights reserved.
Driving Insights
from ...
146© Copyright 2015 Pivotal. All rights reserved.
Internet of What????
147© Copyright 2015 Pivotal. All rights reserved.
Industrial Internet of Things?
148© Copyright 2015 Pivotal. All rights reserved.
IoT Goes Mainstream
According to Gartner, Inc. (a technology research an...
149© Copyright 2015 Pivotal. All rights reserved.
GE Doubles Down
GE invests in IIoT cloud and creates Predix cloud built ...
150© Copyright 2015 Pivotal. All rights reserved.
Converging
Trends
Innovation
New Data New
Processes
New Insights
The Jou...
151© Copyright 2015 Pivotal. All rights reserved.
IoT Key Workflows
Data Flow
Management
Reliable
Infrastructure
Enterpris...
152© Copyright 2015 Pivotal. All rights reserved.
IoT Key Workflows
Data Flow
Management
Reliable
Infrastructure
Enterpris...
153© Copyright 2015 Pivotal. All rights reserved.
Data Flow Management-Data Overload
● The ability to stream
and process m...
154© Copyright 2015 Pivotal. All rights reserved.
Data Flow Management-Data Normalization
and Cleansing
● Organizing Field...
155© Copyright 2015 Pivotal. All rights reserved.
Data Flow Management-Multiple Sources and
Destinations
● Stream data fro...
156© Copyright 2015 Pivotal. All rights reserved.
IoT Key Workflows
Data Flow
Management
Reliable
Infrastructure
Enterpris...
157© Copyright 2015 Pivotal. All rights reserved.
Reliable Infrastructure-High Availability and
Fault Tolerance
● Resource...
158© Copyright 2015 Pivotal. All rights reserved.
Reliable Infrastructure-Scalability
● Must be able to
handle more traffi...
159© Copyright 2015 Pivotal. All rights reserved.
IoT Key Workflows
Data Flow
Management
Reliable
Infrastructure
Enterpris...
160© Copyright 2015 Pivotal. All rights reserved.
Enterprise Level Tooling-Workflows and Tooling
● Manage complex
flows of...
161© Copyright 2015 Pivotal. All rights reserved.
Enterprise Level Tooling-Developer Enablement
● Full Featured APIs
● Ext...
162© Copyright 2015 Pivotal. All rights reserved.
Are you making the most out of your data?
163© Copyright 2015 Pivotal. All rights reserved.
Bringing it all together
164© Copyright 2015 Pivotal. All rights reserved. 164© Copyright 2013 Pivotal. All rights reserved.
Reporting is nice, but...
165© Copyright 2015 Pivotal. All rights reserved.
Predictive
Analytics
Proactive
Monitoring
Reactive
Maintenance
166© Copyright 2015 Pivotal. All rights reserved.
ETL vs Streaming
● Data is loaded in large
batches
● Typically happens o...
167© Copyright 2015 Pivotal. All rights reserved.
ETL vs Streaming
● Continuous, data streams
are “listening” for data
bei...
168© Copyright 2015 Pivotal. All rights reserved.
Reactive Maintenance
● Alert is sent out to someone
on factory floor
● W...
169© Copyright 2015 Pivotal. All rights reserved.
Proactive Maintenance Workflow
● Manager has Dashboard with
all gauges o...
170© Copyright 2015 Pivotal. All rights reserved.
Predictive Analytics Workflow
● Run Machine Learning and
Data Science mo...
171© Copyright 2015 Pivotal. All rights reserved.
Data Streaming Needs an Agile, Scalable and Fast
Solution
Data Lake
Data...
172© Copyright 2015 Pivotal. All rights reserved.
Ingest Transform Sink
SpringXD
Spring XD Orchestrates and Automates all ...
173© Copyright 2015 Pivotal. All rights reserved.
INGEST / SINK PROCESS ANALYZE
• No coding required
• Dozens of built-in
...
174© Copyright 2015 Pivotal. All rights reserved.
Pivotal HDB
Hadoop Native SQL
• Exceptional Hadoop Native
SQL Performanc...
BUILT FOR THE SPEED OF BUSINESS
Driving Real Insights Through Data Science
Prochain SlideShare
Chargement dans…5
×

Driving Real Insights Through Data Science

2 208 vues

Publié le

Major changes in industries have been brought about by the emergence of data-driven discoveries and applications. Many organizations are bringing together their data, and looking to drive change. But the ability to generate new insights in real time from a massive sets of data is still far from commonplace.

At this event, data technology experts and data scientists from Pivotal provided the latest business perspective on how data science and engineering can be used to accelerate the generation of new insights.

For information about upcoming Pivotal events, please visit: http://pivotal.io/news-events/#events

Publié dans : Données & analyses
  • How can I sharpen my memory? How can I improve forgetfulness? find out more... ♥♥♥ https://tinyurl.com/brainpill101
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • How can I improve my memory and concentration? How can I improve my memory for studying?♥♥♥ https://bit.ly/2GEWG9T
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Driving Real Insights Through Data Science

  1. 1. The Journey to Becoming a Data- Driven Enterprise Pivotal Big Data Roadshow 2015
  2. 2. 2© Copyright 2015 Pivotal. All rights reserved. Where we’re going today… 3 Great Keynotes • Journey to a Data-driven Enterprise • Data Science Use Cases • Streaming Data and Predictive Analytics Stock Inference Demo and Architecture Overview Intensive hands-on training sessions
  3. 3. 3© Copyright 2015 Pivotal. All rights reserved. Today’s Agenda 12:00 PM – 12:45 PM - Check-In & Lunch 12:45 PM – 1:00 PM – Welcome and Agenda Review 1:00 PM – 1:PM AM – How Pivotal’s Tools Help Drive Value from Data Science 1:20 PM – 2:20 PM – Accelerating the Generation of New Insights – R&D Use Case Review and Demo 2:20 PM – 2:30 PM – Coffee Break 2:30 PM – 3:00 PM – Manufacturing Use Case Review and Demo 3:00 PM – 3:15 PM – Closing Remarks
  4. 4. © Copyright 2015 Pivotal. All rights reserved. MASHING BIG DATA WITH BIG MACHINES IS ‘BEAUTIFUL, DESIRABLE, INVESTABLE’ - IT COULD TRANSFORM GE'S BUSINESS - AND THE ECONOMY. “ ”JEFF IMMELT, CEO, GE
  5. 5. © Copyright 2015 Pivotal. All rights reserved. THE POWER OF 1 R X Increasing Freight Utilization Rail Predictive Maintenance Healthcare Predictive Diagnostics Power Driving Outcomes That Matter One Percent Improvement Equals $27B Industry Value by Reducing System Inefficiency $63B Industry Value by Reducing Process Inefficiency $66B Industry Value with Efficiency Improvements In Gas-fired Power Plant Fleets Source: General Electric
  6. 6. © Copyright 2015 Pivotal. All rights reserved. DATA-DRIVEN ENTERPRISE JOURNEY STORE • Structured • Unstructured • High Volume • High Velocity ANALYZE • Predictive Analytics • Machine Learning • Advance Data Science • Realtime Analytics DEVELOP • Advanced Analytic Pipelines • Realtime Analytical Applications • Global Scale Data-Driven Applications • Enterprise, Consumer, and Mobile INNOVATE • Agile Dev Expertise • DevOps • Microservice • Continuous Delivery • Closed Loop Applications AGILE DEVELOPMENT BIG DATA PREDICTIVE ANALYTICS CLOUD NATIVE PLATFORM
  7. 7. 8© Copyright 2015 Pivotal. All rights reserved. 0% of CIOs think their IT infrastructure is fully prepared for big data (3) 30% of companies have deployed advanced analytics, 11% big data analysis (4) 44% of new applications failed to meet performance expectations (5) 2X 90% of companies allocate at least 2X more cloud capacity than needed to ensure performance (6) But… 80% of CEOs thinking data mining and analysis are strategically important (1) 4% of companies use analytics effectively (2) (1) 2015 PWC CEO Survey; (2)2013 Baine and Company - The Value of Big Data; (3) 2014 IT Infrastructure Conversation - IBM; (4) Ernest and Young - 2014 Enterprise IT Trends and Investments; (5) 2014 Riverbed Tecnologies - The Transformers; (6) 2014 ElasticHosts CIO Study LARGE ENTERPRISE BIG DATA TROUBLE
  8. 8. 9© Copyright 2015 Pivotal. All rights reserved. BIG DATA CHASM 70% of data generated by customers 80% of data stored 3% prepared for analysis 0.5% being analyzed <0.5% being operationalized 9 THE DATA DIVIDE
  9. 9. 10© Copyright 2015 Pivotal. All rights reserved. Software Is Eating The World Data Is Fueling Software SOFTWARE IS EATING THE WORLD
  10. 10. 11© Copyright 2015 Pivotal. All rights reserved. WE CHOSE PIVOTAL BECAUSE WE BELIEVE IT PROVIDES A 360-DEGREE VIEW OF THE PROCESS. FROM A DATA SCIENCE AND DATA TECHNOLOGY PERSPECTIVE, IT MEANS DELIVERING BEST-IN- CLASS DATA TECHNOLOGIES AND ENABLING THEM ON THEIR PLATFORM. “ ”
  11. 11. 12© Copyright 2015 Pivotal. All rights reserved. ACROSS INDUSTRIES
  12. 12. 13© Copyright 2015 Pivotal. All rights reserved. THE NEW DATA IMPERATIVES Converged Data & Cloud OpenData-Driven Apps
  13. 13. 14© Copyright 2015 Pivotal. All rights reserved. THE BIG DATA PROBLEM Fragmentation ConstraintsComplexity
  14. 14. 15© Copyright 2015 Pivotal. All rights reserved. • Remove Lock-in • Leverage Ecosystem • Co-innovate GUIDING PRINCIPLES IN THE NEW ERA OPEN AGILE CLOUD-READY • Shorten innovation cycles • Reduce TCO • Improve TTM • Solve business problems • Avoid lock-in • Appropriate security
  15. 15. 16© Copyright 2015 Pivotal. All rights reserved. JOURNEY TO A DATA-DRIVEN ENTERPRISE Deploy analytic apps and automate at scale Perform advanced analytics Discover insights Modernize data infrastructure
  16. 16. 17© Copyright 2015 Pivotal. All rights reserved. Deploy analytic apps and automate at scale Perform advanced analytics Discover insights Modernize data infrastructure DATA-DRIVEN COMPANIES: USE MODERN DATA INFRASTRUCTURE
  17. 17. 18© Copyright 2015 Pivotal. All rights reserved. MODERNIZE DATA INFRASTRUCTURE Elastic, Scale-out storage and processing Flexible data types and pipelining ETL on demand: low operational cost Expanded use cases Higher quality analytics Lowered storage/processing cost Less fragmented ecosystem Reduced vendor lock-in REQUIREMENTS BENEFITS Cloud friendly and open-source based
  18. 18. 19© Copyright 2015 Pivotal. All rights reserved. Modernize data infrastructure Deploy analytic apps and automate at scale Perform advanced analytics Discover insights DATA-DRIVEN COMPANIES: STRATEGICALLY USE ADVANCED ANALYTICS
  19. 19. 20© Copyright 2015 Pivotal. All rights reserved. ADVANCED ANALYTICS Leverage existing skills and tools Rapid time to insights Internet of Things use cases Rapid time to insights Solve business problems Predictive insights: proactive execution REQUIREMENTS BENEFITS Machine learning and advanced analytics 01010101010101 01001010101010 10101100101010 SQL- compliant batch and interactive queries Massive stream processing 0101010101010101001010 1010101010110010101010 10101010
  20. 20. 21© Copyright 2015 Pivotal. All rights reserved. Modernize data infrastructure Perform advanced analytics Discover insights DATA-DRIVEN COMPANIES: INNOVATE AT SCALE Deploy analytic apps and automate at scale
  21. 21. 22© Copyright 2015 Pivotal. All rights reserved. ANALYTIC APPS AND AUTOMATION AT SCALE Reduced time to action Low ‘analytics  app-dev’ integration cost Reduced time to insights Flexible ingestion: low operating cost High performance: low operating cost Transactional safety: business critical ops REQUIREMENTS BENEFITS Low-latency, distributed in-memory transactions Resilient, scale-out messaging and object storage Agile analytic app-dev with enterprise PaaSPaaS
  22. 22. 23© Copyright 2015 Pivotal. All rights reserved. JOURNEY TO A DATA-DRIVEN ENTERPRISE Deploy analytic apps and automate at scale Perform advanced analytics Discover insights Modernize data infrastructure Pivotal Data Science helps you move from BI to Data Science Pivotal Labs helps you move to an agile development of apps at scale Pivotal Data Engineering helps you move from data administration to data engineering
  23. 23. 26© Copyright 2015 Pivotal. All rights reserved. PIVOTAL BIG DATA SUITE
  24. 24. 27© Copyright 2015 Pivotal. All rights reserved. Open sourcing all Pivotal Big Data Suite components including: WORLD’S FIRST OPEN SOURCED BIG DATA PORTFOLIO BUILDING ON SUCCESS OF CLOUD FOUNDRY FOUNDATION BUILT FOR ENTERPRISES Pivotal GemFire Apache Geode Apache HAWQ Pivotal HDB Pivotal Greenplum Database
  25. 25. 28© Copyright 2015 Pivotal. All rights reserved. BUILT FOR ENTERPRISES Value added features: enterprise grade performance + robustness without lock-in • Advanced Query Optimization in analytics • WAN replication and continuous query in transactional processing Flexible Deployment models: align to business objectives and needs • Balance cost objectives with policy and compliance requirements • Leverage Pivotal’s pre-integration + certification on supported configurations Enterprise grade support: one throat to choke for the suite • Focus on business problems – not on lifecycle management • Expert support on Big Data Suite means reduced business risk
  26. 26. 29© Copyright 2015 Pivotal. All rights reserved. • Common core for Hadoop ecosystem • Rapidly accelerated certifications, ecosystem development and enterprise-grade quality OpenDataPlatform.org OPEN
  27. 27. 30© Copyright 2015 Pivotal. All rights reserved. AGILE Deploy analytic apps and automate at scale Perform advanced analytics Discover insights Modernize data infrastructure Spring XD Spark Pivotal HD & Open Data Platform Pivotal Greenplum Database Pivotal HDB Rabbit MQ Redis Pivotal GemFire Pivotal BDS on PCF Pivotal Cloud Foundry
  28. 28. 31© Copyright 2015 Pivotal. All rights reserved. CLOUD-READY COMMODITY HARDWARE APPLIANCE HYBRID CLOUDCLOUD IaaS IaaS PAAS
  29. 29. 32© Copyright 2015 Pivotal. All rights reserved. DATA-DRIVEN ENTERRPRISE JOURNEY WITH PIVOTAL BIG DATA SUITE STORE • Structured • Unstructured • High Volume • High Velocity ANALYZE • Predictive Analytics • Machine Learning • Advance Data Science • Realtime Analytics DEVELOP • Advanced Analytic Pipelines • Realtime Analytical Applications • Global Scale Data-Driven Applications • Enterprise, Consumer, IoT, and Mobile INNOVATE • Agile Dev Expertise • DevOps • Microservices • Continuous Delivery • Closed Loop Applications AGILE DEVELOPMENT BIG DATA PREDICTIVE ANALYTICS CLOUD NATIVE PLATFORM Spring XD Spark Pivotal HD & Open Data Platform Spring XD Pivotal Greenplum Database Pivotal HDB Spring XD Pivotal GemFire Rabbit MQ Spring Cloud Pivotal BDS on PCF Pivotal Cloud Foundry Pivotal LabsData ScienceData Engineering
  30. 30. 35© Copyright 2015 Pivotal. All rights reserved. FOR FURTHER INFO, CHECKOUT… • Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data • Pivotal Blog @ http://blog.pivotal.io • Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal • Pivotal Academy @ https://pivotal.biglms.com Or reach out to your local Pivotal Account Executive…
  31. 31. 36© Copyright 2015 Pivotal. All rights reserved. 36© Copyright 2013 Pivotal. All rights reserved. Pivotal Data Science Overview and Use Cases Pivotal Big Data Roadshow
  32. 32. 37© Copyright 2015 Pivotal. All rights reserved. DATA SCIENCE? App Development Analytics Business Intelligence Reporting Visualization Dashboards Insights Big Data Machine Learning Statistics Mathematics Time Series Algorithms Databases Software Modeling Queries Real-Time Sensors Predictive Models ETL Research Hadoop Distributed Computing MapReduce SQL In-Memory OLAP Text Mining Unstructured Data Open Source Decision Science Ad Hoc Queries Hacking In-Database Analytics Internet of Things Data Cleansing Sentiment
  33. 33. 38© Copyright 2015 Pivotal. All rights reserved. • ETL • Unstructured • Data Cleansing • Sensors Data Related • Algorithms • Mathematics • Statistics • Econometrics • Predictive Modeling • Machine Learning • Text Mining • Sentiment • Map Reduce Fields of Study & Techniques • Dashboards • Insights • Visualization • Ad Hoc Queries • Reporting Business Intelligence • Software • In-Database Analysis • Distributed Computing • Hadoop • Open Source Implementation • Big Data • Decision Science • Internet of Things • Real-Time • Hacking • In-Memory Industry Buzzwords
  34. 34. 39© Copyright 2015 Pivotal. All rights reserved. What is Data Science? The use of statistical and machine learning techniques on big multi-structured data in a distributed computing environment to identify correlations and causal relationships, classify and predict events, identify patterns and anomalies, and infer probabilities, interest, and sentiment. DRIVE AUTOMATED, LOW-LATENCY ACTIONS IN RESPONSE TO EVENTS OF INTEREST
  35. 35. 40© Copyright 2015 Pivotal. All rights reserved. Gene Sequencing Smart Grids COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 TO $1K IN 2014 READING SMART METERS EVERY 15 MINUTES IS 3000X MORE DATA INTENSIVE Stock Market Social Media FACEBOOK UPLOADS 250 MILLION PHOTOS EACH DAY Billions of Data Points Oil Exploration Video Surveillance OIL RIGS GENERATE 25000 DATA POINTS PER SECOND Medical Imaging Mobile Sensors
  36. 36. 41© Copyright 2015 Pivotal. All rights reserved. What is Big Data Analytics? Descriptive Analytics WHAT HAPPENED? Diagnostic Analytics WHY DID IT HAPPENED? Predictive Analytics WHAT WILL HAPPEN? Prescriptive Analytics HOW CAN WE MAKE IT HAPPEN? Complexity Value of Analytics ($)
  37. 37. 42© Copyright 2015 Pivotal. All rights reserved. P L A T F O R M Data Science Toolkit KEY TOOLS KEY LANGUAGES SQL
  38. 38. 43© Copyright 2015 Pivotal. All rights reserved. Scalable, In-Database ML • Open Source https://github.com/madlib/madlib • Works on Greenplum DB, HAWQ and PostgreSQL • In active development by Pivotal • Downloads and Docs: http://madlib.net/
  39. 39. 44© Copyright 2015 Pivotal. All rights reserved. Functions Supervised Learning Regression Models • Cox Proportional Hazards Regression • Elastic Net Regularization • Generalized Linear Models • Linear Regression • Logistic Regression • Marginal Effects • Multinomial Regression • Ordinal Regression • Robust Variance, Clustered Variance • Support Vector Machines Tree Methods • Decision Tree • Random Forest Other Methods • Conditional Random Field • Naïve Bayes Unsupervised Learning • Association Rules (Apriori) • Clustering (K-means) • Topic Modeling (LDA) Statistics Descriptive • Cardinality Estimators • Correlation • Summary Inferential • Hypothesis Tests Other Statistics • Probability Functions Other Modules • Conjugate Gradient • Linear Solvers • PMML Export • Random Sampling • Term Frequency for Text Time Series • ARIMA Aug 2015 Data Types and Transformations • Array Operations • Dimensionality Reduction (PCA) • Encoding Categorical Variables • Matrix Operations • Matrix Factorization (SVD, Low Rank) • Norms and Distance Functions • Sparse Vectors Model Evaluation • Cross Validation Predictive Analytics Library
  40. 40. 45© Copyright 2015 Pivotal. All rights reserved. A single address for everything analytics Analytics with Pivotal Time-to-Insights FORECASTING CLUSTERING REGRESSION CLASSIFICATION OPTIMIZATION
  41. 41. 46© Copyright 2015 Pivotal. All rights reserved. Smart Systems = Sensors + Digital Brain + Actuators Problem Formulation Modeling Step Data Step Application Step Data Science for Building Models Sensors & Actuators Data Lake
  42. 42. 47© Copyright 2015 Pivotal. All rights reserved. 47© Copyright 2013 Pivotal. All rights reserved. Data Science Use Cases
  43. 43. 48© Copyright 2015 Pivotal. All rights reserved. 48© Copyright 2013 Pivotal. All rights reserved. Financial Services
  44. 44. 49© Copyright 2015 Pivotal. All rights reserved. Identifying and Pricing Cross-Sell Opportunities CUSTOMER A global financial services provider BUSINESS PROBLEM Identify cross-sell opportunities between two business arms of a financial institution. CHALLENGES Integration of large-scale data originating from multiple data warehouses. Developing predictive models to identify novel cross-sell opportunities within the financial institution. Evaluate the identified cross-sell opportunities by their revenue potential. SOLUTIONS  Fast integration of data in Pivotal Greenplum Database.  Predictive models and evaluation of profitability: – Association rule. – Logistic regression for each product offered. – Estimation of revenue opportunity.  On-demand reporting and visualization via custom dashboards connected to in-database models.  Identified multi-million dollar opportunities for the bank.
  45. 45. 50© Copyright 2015 Pivotal. All rights reserved. Financial Compliance BUSINESS PROBLEM • Ensure compliance with Dodd-Frank and Basel Committee regulations • Identify underlying risk and fraud while reducing the compliance department’s overburdened Emails Chats Trades Transactions Policy Securities Phone Calls Watch Lists … Financial compliance Data Lake Data integration Data clean up Modeling Classification and ranking Analyst user interfaces Feedback Analytics Analyst feedback Data integration: e.g., append trade information with email and chat communications Data cleanup: e.g., identify newsletters and spam emails Modeling: • Predictive modeling to flag messages and trades • Graph and cohort analysis Analyst feedback Reviewed fraud instances included in periodic model refreshes SOLUTION  A data lake platform coupled with cutting edge data science techniques  Flexible user interface to promote an adaptive, continuously learning compliance framework
  46. 46. 51© Copyright 2015 Pivotal. All rights reserved. 51© Copyright 2013 Pivotal. All rights reserved. Telco & Mobile
  47. 47. 52© Copyright 2015 Pivotal. All rights reserved. Subscriber Micro-Segmentation CUSTOMER A major telco with cable & VOD, internet, and phone business units BUSINESS PROBLEM Better understand aggregated subscriber behavior to drive business strategy using newly available data sources CHALLENGES ▪ Large quantities of deep packet inspection data and set top box data that had not been analyzed before ▪ Needed to incorporate internet usage and TV consumption information into pre-existing subscriber segments SOLUTION ▪ Generated new subscriber segments that incorporated features based on consumption of TV and internet services across a variety of devices ▪ Crossed new segments with existing segments to generate new micro-segments for cross-sell/upsell and new product development opportunities Customized Micro-Segments
  48. 48. 53© Copyright 2015 Pivotal. All rights reserved. Newly Identified Behavior-Based SegmentsSubscribers Moderates OTT & Data Heavyweights Portable OTT Entertainment Seekers iPhone Heavy Android Heavy iPad Heavy In-Home OTT Entertainment Seekers In-Home Native Content Seekers VOD Heavy TV Heavy
  49. 49. 54© Copyright 2015 Pivotal. All rights reserved. Opportunities for Data-Driven Decisions in Pharma
  50. 50. 55© Copyright 2015 Pivotal. All rights reserved. Data driven drugs: From discovery to delivery RICH DATA SOURCES • Molecular data • Cellular drug screens • Animal models • Clinical data including notes, images, markers (e.g. genomics, lab results) • Sensor and assay data • Internal and partner/purchased external data • Contact center data • Patient registries, public and federal data, clinical partnerships Clinical Trials Manufacturing Marketing Distribution and surveillance Drug discovery + development
  51. 51. 56© Copyright 2015 Pivotal. All rights reserved. A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing Input materials Mix Incubate Filter Centrifuge Final Product 0 5 10 15 20 25 30 0 50 100 150 200 Sensors High-Content Screens TEMP TIME Absorbance Elution volume Velocity Time Automated raw materials mixing
  52. 52. 57© Copyright 2015 Pivotal. All rights reserved. Vaccine Potency Prediction CUSTOMER A major pharmaceutical company BUSINESS PROBLEM Predict potency and antigen levels of live virus vaccines based on manufacturing sensor data and manual data collected throughout the process. CHALLENGES  Customer’s data model was not optimal for running analytical queries  Manual data quality issues  Data capture was performed with varying consistency due to high cost associated with manual data collection SOLUTION  Introduced a new data model to make data accessible and enable analytics (including LIMS and DeltaV)  Built automated outlier detection/correction methods to address manual data entry quality issues  Devised imputation methods to deal with data completeness issues  Built predictive models with high accuracy
  53. 53. 58© Copyright 2015 Pivotal. All rights reserved. http://blog.pivotal.io/data-science-pivotal Check out the Pivotal Data Science Blog!
  54. 54. 59© Copyright 2015 Pivotal. All rights reserved. FOR FURTHER INFO… • Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data • Pivotal Blog @ http://blog.pivotal.io • Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal • Pivotal Academy @ https://pivotal.biglms.com • Or reach out to your local Pivotal Account Executive…
  55. 55. 60© Copyright 2015 Pivotal. All rights reserved. Pivotal Data Science Labs: Packaged Services • Analytics Roadmap • Prioritized Opportunities • Architectural Recommendati ons • Hands-on training • Hosted data on Pivotal Data stack • Results review & assessment • On-site MPP analytics training • Analytics tool- kit • Quick insight (2 weeks) • Prof. services • Data science model building • Ready-to- deploy model(s) • Prof. services • Data science model building • Ready-to- deploy model(s) LAB PRIMER (2-Week Roadmapping) LAB 600 (6-Week Lab) LAB 1200 (12-Week Lab) LAB 100 (Analytics Bundle) DATA JAM (Internal DS Contest)
  56. 56. 61© Copyright 2015 Pivotal. All rights reserved. Data Streaming and Predictive Analytics Using Pivotal Big Data Suite
  57. 57. 62© Copyright 2015 Pivotal. All rights reserved. Converging Trends Innovation New Data New Processes New Insights The Journey to the Data-Driven Enterprise Data Science and Machine Learning Big Data IoT, Mobile Apps, Social Media
  58. 58. 63© Copyright 2015 Pivotal. All rights reserved. HDFS Data Lake Ingest Store Analytics Hard to change Labor intensive Inefficient Coding based No real-time information Based on expensive ETL Migrating from a Reactive, Static and Constrained Model…
  59. 59. 64© Copyright 2015 Pivotal. All rights reserved. HDFSData Lake Expert System / Machine Learning In-Memory Real- Time Data Continuous Learning Continuous Improvement Continuous Adapting Data Stream Pipeline Multiple Data Sources Real-Time Processing Store Everything To Pro-Active, Self-Improving, Machine Learning Systems
  60. 60. 65© Copyright 2015 Pivotal. All rights reserved. New York Times Research: http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html “ 50-80% OF THE TIME ON DATA SCIENCE PROJECTS IS SPENT ON DATA WRANGLING ”
  61. 61. 66© Copyright 2015 Pivotal. All rights reserved. Data Feeds Stream Processing Expert Systems Machine Learning Historical Data Business Value Smart Decisions Still… HDFS Data Lake
  62. 62. 67© Copyright 2015 Pivotal. All rights reserved. Ingest Transform Sink SpringXD GemFire Data Stream Needs an Agile, Scalable and Fast Solution HAWQ GPDB Data Lake
  63. 63. 68© Copyright 2015 Pivotal. All rights reserved. Ingest Transform Sink SpringXD Distributed Computing In-Memory Real-Time Data Spring XD Orchestrates and Automates all the Steps on Data Stream Pipelining Expert System / Machine Learning Extensible Open-Source Fault-Tolerant Horizontally Scalable HAWQ GPDB Data Lake
  64. 64. 69© Copyright 2015 Pivotal. All rights reserved. INGEST / SINK PROCESS ANALYZE • No coding required • Dozens of built-in connectors • Seamless integration with Kafka, Sqoop • Create new connectors easily using Spring • Call Spark, Reactor or RxJava • Built-in configurable filtering, splitting and transformation • Out-of-box configurable jobs for batch processing • Import and invoke PMML jobs easily • Call Python, R, Madlib and other tools • Built-in configurable counters and gauges Spring XD State of the Art Data Pipeline Automation
  65. 65. 70© Copyright 2015 Pivotal. All rights reserved. Ingest Transform Sink SpringXD Distributed Computing GemFire Provides Scalable, Low-Latency Data Access, Storage and Event Processing Expert System / Machine Learning GemFire Extensible Open-Source Fault-Tolerant Horizontally Scalable HAWQ GPDB Data Lake
  66. 66. 71© Copyright 2015 Pivotal. All rights reserved. GemFire • In-Memory Enterprise Data Grid • Horizontally Scalable, Consistent, Highly Available • Event handling • Continuous Queries • Enterprise Data Geo Distribution In-memory Real Time Data
  67. 67. 72© Copyright 2015 Pivotal. All rights reserved. Ingest Transform Sink SpringXD Distributed Computing Pivotal Provides SQL Based Advanced Analytics Expert System / Machine Learning GemFire Extensible Open-Source Fault-Tolerant Horizontally Scalable Data Lake HAWQ GPDB
  68. 68. 73© Copyright 2015 Pivotal. All rights reserved. HAWQ • Massively Parallel Processing RDBMS on HADOOP • ANSI SQL on Hadoop • Extremely high performance for analytics (not like Hive) • Stores all data directly on HDFS • Open-Source Advanced SQL analytics in Hadoop Combining SQL with Hadoop is key for analytics SQL remains #1 choice for Data Science
  69. 69. 74© Copyright 2015 Pivotal. All rights reserved. Ingest Transform Sink SpringXD Developers and Data Scientists Can Focus on the Business Value of Data GemFire Extensible Open-Source Fault-Tolerant Horizontally Scalable Data Lake HAWQ GPDB
  70. 70. 75© Copyright 2015 Pivotal. All rights reserved. Data Streaming Reference Architecture Data Feeds Transactional Apps Analytic Apps Data Stream Pipeline Distributed Computing Real-Time Data Expert Systems & Machine Learning Advanced Analytics HDFSData Lake
  71. 71. 76© Copyright 2015 Pivotal. All rights reserved. Data Streaming Reference Architecture Data Feeds Transactional Apps Analytic Apps Data Stream Pipeline HDFSData Lake GemFire HAWQ GPDB SpringXD
  72. 72. 77© Copyright 2015 Pivotal. All rights reserved. “ SO WE ARE MOVING TO A WORLD WHERE THE MACHINES WE WORK WITH ARE NOT JUST INTELLIGENT; THEY ARE BRILLIANT.THEY ARE SELF-AWARE, THEY ARE PREDICTIVE, REACTIVE AND SOCIAL. IT'S A WORLD WHERE INFORMATION ITSELF BECOMES INTELLIGENT AND COMES TO US AUTOMATICALLY WHEN WE NEED IT WITHOUT HAVING TO LOOK FOR IT. ”MARCO ANNUNZIATA, GE
  73. 73. 78© Copyright 2015 Pivotal. All rights reserved. Demo Powered by Pivotal Big Data Suite
  74. 74. 79© Copyright 2015 Pivotal. All rights reserved. It's all about DATA Data Sources Look for patterns Prediction
  75. 75. Transform Sink SpringXD Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native Machine Learning Enrich Filter Split Dashboard Indicators 1 2 Predict 3 Real data Simulator /Stocks /TechIndicators /Predictions
  76. 76. 81© Copyright 2015 Pivotal. All rights reserved.
  77. 77. 91© Copyright 2015 Pivotal. All rights reserved. “ THE REAL OPPORTUNITY FOR CHANGE...SURPASSING THE MAGNITUDE OF THE CONSUMER INTERNET...IS THE INDUSTRIAL INTERNET, AN OPEN, GLOBAL NETWORK THAT CONNECTS PEOPLE, DATA AND MACHINES. ”JEFF IMMELT, CEO, GE
  78. 78. 100© Copyright 2015 Pivotal. All rights reserved. FOR FURTHER INFO, CHECKOUT… • Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data • Pivotal Blog @ http://blog.pivotal.io • Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal • Pivotal Academy @ https://pivotal.biglms.com • Or reach out to your local Pivotal Account Executive…
  79. 79. BUILT FOR THE SPEED OF BUSINESS
  80. 80. 102© Copyright 2015 Pivotal. All rights reserved. 102© Copyright 2013 Pivotal. All rights reserved. Accelerating the Generation of New Insights October 27, 2015 Sarah Aerni, Data Science Antonio Petrole, Data Engineering
  81. 81. BUILT FOR THE SPEED OF BUSINESS
  82. 82. 104© Copyright 2015 Pivotal. All rights reserved. Gene Sequencing Smart Grids COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 TO $1K IN 2014 READING SMART METERS EVERY 15 MINUTES IS 3000X MORE DATA INTENSIVE Stock Market Social Media FACEBOOK UPLOADS 250 MILLION PHOTOS EACH DAY Technology to process and store data is needed in all industries Oil Exploration Video Surveillance OIL RIGS GENERATE 25000 DATA POINTS PER SECOND Medical Imaging Mobile Sensors
  83. 83. 105© Copyright 2015 Pivotal. All rights reserved. What is Big Data Analytics? Descriptive Analytics WHAT HAPPENED? Diagnostic Analytics WHY DID IT HAPPEN? Predictive Analytics WHAT WILL HAPPEN? Prescriptive Analytics HOW CAN WE MAKE IT HAPPEN? Complexity Value of Analytics ($)
  84. 84. 106© Copyright 2015 Pivotal. All rights reserved. Opportunities for Data-Driven Decisions in Pharma
  85. 85. 107© Copyright 2015 Pivotal. All rights reserved. Data driven drugs: From discovery to delivery RICH DATA SOURCES • Molecular data • Cellular drug screens • Animal models • Clinical data including notes, images, markers (e.g. genomics, lab results) • Sensor and assay data • Internal and partner/purchased external data • Contact center data • Patient registries, public and federal data, clinical partnerships Clinical Trials Manufacturing Marketing Distribution and surveillance Drug discovery + development
  86. 86. 108© Copyright 2015 Pivotal. All rights reserved. A pipeline of sensors and opportunities for optimizing output Internet of Things in Manufacturing Input materials Mix Incubate Filter Centrifuge Final Product 0 5 10 15 20 25 30 0 50 100 150 200 Sensors High-Content Screens TEMP TIME Absorbance Elution volume Velocity Time Automated raw materials mixing
  87. 87. 109© Copyright 2015 Pivotal. All rights reserved. Vaccine Potency Prediction CUSTOMER A major pharmaceutical company BUSINESS PROBLEM Predict potency and antigen levels of live virus vaccines based on manufacturing sensor data and manual data collected throughout the process. CHALLENGES  Customer’s data model was not optimal for running analytical queries  Manual data quality issues  Data capture was performed with varying consistency due to high cost associated with manual data collection SOLUTION  Introduced a new data model to make data accessible and enable analytics (including LIMS and DeltaV)  Built automated outlier detection/correction methods to address manual data entry quality issues  Devised imputation methods to deal with data completeness issues  Built predictive models with high accuracy
  88. 88. 110© Copyright 2015 Pivotal. All rights reserved. Interpreting the utility of a measure obtained during manufacturing based on model outcomes Sample model insights  Some features may reveal tunable parameters to alter potency, others may simply be markers  Features consistently absent from models may be uninformative for predicting potency  Opportunities to provide real-time feedback on data entry errors and predicted potency outcomes Assayed value Duration of a step Potency Potency Correlation=0.45 Correlation=0.38
  89. 89. 111© Copyright 2015 Pivotal. All rights reserved. Need for new environments to process big data? HDFS STORAGE AND MPP ARCHITECTURES DISTRIBUTE STORAGE AND PREVENT DATA MOVEMENT VARIETY/VELOCITY DISTRIBUTED COMPUTATION FOR PARALLELIZATION PETABYTES OF DATA OPEN-SOURCE LIBRARY FOR MACHINE LEARNING AT SCALE AND FRAMEWORK TO ACCESS COMMON LANGUAGES RAPIDLY EVOLVING FIELD OF DATA SCIENCE AND TOOLS SQL ENGINE AND ODBC/JDBC CONNECTIONS TO HADOOP MANY EXISTING LIBRARIES, TOOLS AND EXPERTISE FLEXIBLE SCALABLE ENABLING ACCESSIBLE
  90. 90. 112© Copyright 2015 Pivotal. All rights reserved. Multiple tools with a single, simple goal: Distributed storage with in-place computation Pivotal Hadoop Pivotal Greenplum Database HAWQ
  91. 91. 113© Copyright 2015 Pivotal. All rights reserved. Multiple tools with a single, simple goal: Distributed storage with in-place computation Think of it as multiple PostGreSQL servers Segments/Workers Master Rows are distributed across segments by a particular field (or randomly) Pivotal Hadoop Pivotal Greenplum Database HAWQ
  92. 92. 114© Copyright 2015 Pivotal. All rights reserved. Identifying duplicates: counting with grouping Opportunities for performance improvements – Sorting and re-sorting is required in many pipelines – Single-threaded processes create bottlenecks in speed and need to move data https://www.broadinstitute.org/gatk//events/2038/GATKwh0-BP-1-Map_and_Dedup.pdf Solution: Leverage Pivotal’s distributed MPP environment by using common database functions
  93. 93. 115© Copyright 2015 Pivotal. All rights reserved. Identifying duplicates: counting with grouping Reference genome Mapped reads
  94. 94. 116© Copyright 2015 Pivotal. All rights reserved. Identifying duplicates: counting with grouping Duplicates 1 3 1 1 1 1 5 1 3 select locus, count(*) from reads group by locusReference genome Mapped reads
  95. 95. 117© Copyright 2015 Pivotal. All rights reserved. Reference genome Mapped reads Counting numbers of reads mapped to features select exon, count(*) from reads JOIN refseq ON(<reads overlap exon>) group by exon 5 17 12
  96. 96. 118© Copyright 2015 Pivotal. All rights reserved. Multiple tools with a single, simple goal: Distributed storage with in-place computation Think of it as distributed file system with very large blocks of data Schema on read allows flexibility for a variety of datasets Compute using a variety of paradigms (e.g. MapReduce) Pivotal Hadoop Pivotal Greenplum Database HAWQ Name Node Data Node 1 Data Node 2 Data Node 3 Data Node 4 1 2 3 2 3 1 1 2
  97. 97. 119© Copyright 2015 Pivotal. All rights reserved. Multiple tools with a single, simple goal: Distributed storage with in-place computation • SQL compliant • World-class query optimizer • Interactive query • Horizontal scalability • Robust data management • Common Hadoop formats • Deep analytics Pivotal Hadoop Pivotal Greenplum Database HAWQ Think of it as distributed PostGreSQL (GPDB) on Hadoop • SQL compliant • World-class query optimizer • Interactive query • Horizontal scalability • Robust data management • Common Hadoop formats • Deep analytics
  98. 98. 120© Copyright 2015 Pivotal. All rights reserved. A single address for everything analytics Analytics with Pivotal Time-to-Insights FORECASTING CLUSTERING REGRESSION CLASSIFICATION OPTIMIZATION
  99. 99. 121© Copyright 2015 Pivotal. All rights reserved. P L A T F O R M Data Science Toolkit KEY TOOLS KEY LANGUAGES SQL
  100. 100. 122© Copyright 2015 Pivotal. All rights reserved. Historically data was studied in silos BRCA dataset Treatments Protein Assays Imaging VariantsPatient History & Follow-Ups Gene Expression Copy Number Variation miRNA
  101. 101. 123© Copyright 2015 Pivotal. All rights reserved. Genomics Data Center Researcher Computing ClusterUnnecessary data movement Network usage Need for new environment: Data movement Computation and storage in a single location
  102. 102. 124© Copyright 2015 Pivotal. All rights reserved. In-database genome-wide association study Network Interconnect Master Severs Segment Severs SQL & RCOVARIATES GENOTYPES Indiv Covariates 1 2 10 1 F 23 18 2 M 39 41 3 M 50 23 N F 19 24 SNP 1 2 M A A C C TT A T C G TT A A G G T C TT C G T C
  103. 103. 125© Copyright 2015 Pivotal. All rights reserved. In-database genome-wide association study Network Interconnect Master Severs Segment Severs SNP1 SNP2 SNPM SQL & R Indiv Covariates 1 2 10 1 F 23 18 2 M 39 41 3 M 50 23 N F 19 24 Indiv SN P Gen o 1 1 AA 2 1 AT 3 1 AA 1 2 CC 2 2 CG 3 2 GG N M TC COVARIATES GENOTYPES
  104. 104. 126© Copyright 2015 Pivotal. All rights reserved. In-database genome-wide association study Network Interconnect Master Severs Segment Severs SNP1 SNP2 SNPM Pval1 Pval2 PvalM SQL & R Indiv Covariates 1 2 10 1 F 23 18 2 M 39 41 3 M 50 23 N F 19 24 Indiv SN P Gen o 1 1 AA 2 1 AT 3 1 AA 1 2 CC 2 2 CG 3 2 GG N M TC COVARIATES GENOTYPES
  105. 105. 127© Copyright 2015 Pivotal. All rights reserved. In-database genome-wide association study Network Interconnect Master Severs Segment Severs SNP1 SNP2 SNPM Pval1 Pval2 PvalM SQL & R Indiv Covariates 1 2 10 1 F 23 18 2 M 39 41 3 M 50 23 N F 19 24 Indiv SN P Gen o 1 1 AA 2 1 AT 3 1 AA 1 2 CC 2 2 CG 3 2 GG N M TC COVARIATES GENOTYPES
  106. 106. 128© Copyright 2015 Pivotal. All rights reserved. In-database genome-wide association study Network Interconnect Master Severs Segment Severs SNP1 SNP2 SNPM Pval1 Pval2 PvalM SQL & R Indiv Covariates 1 2 10 1 F 23 18 2 M 39 41 3 M 50 23 N F 19 24 Indiv SN P Gen o 1 1 AA 2 1 AT 3 1 AA 1 2 CC 2 2 CG 3 2 GG N M TC SNP P-value 1 2.34x10-21 2 0.395 3 7.15x10-17 M 0.000142 COVARIATES GENOTYPES RESULTS
  107. 107. 129© Copyright 2015 Pivotal. All rights reserved. In-database genome-wide association study Network Interconnect Master Severs Segment Severs SNP1 SNP2 SNPM Pval1 Pval2 PvalM SQL & R Indiv Covariates 1 2 10 1 F 23 18 2 M 39 41 3 M 50 23 N F 19 24 Indiv SN P Gen o 1 1 AA 2 1 AT 3 1 AA 1 2 CC 2 2 CG 3 2 GG N M TC SNP P-value 1 2.34x10-21 2 0.395 3 7.15x10-17 M 0.000142 COVARIATES GENOTYPES RESULTS • In-database computation of 1 million loci for thousands of individuals in seconds • Results are easily manipulated and explored
  108. 108. 130© Copyright 2015 Pivotal. All rights reserved. In-database genome-wide association study Network Interconnect Master Severs Segment Severs SNP1 SNP2 SNPM Pval1 Pval2 PvalM LOR1 LOR2 LORM SQL & R Indiv Covariates 1 2 10 1 F 23 18 2 M 39 41 3 M 50 23 N F 19 24 Indiv SN P Gen o 1 1 AA 2 1 AT 3 1 AA 1 2 CC 2 2 CG 3 2 GG N M TC SNP P-value 1 2.34x10-21 2 0.395 3 7.15x10-17 M 0.000142 COVARIATES GENOTYPES RESULTS • In-database computation of 1 million loci for thousands of individuals in seconds • Results are easily manipulated and explored
  109. 109. 131© Copyright 2015 Pivotal. All rights reserved. Procedural Languages in Big Data Science  HAWQ & PL/X can take advantage of “data parallel” tasks by performing analyses in parallel – embarrassingly parallel tasks – Little/no effort required to break up the problem into parallel tasks – No dependency (or communication) between tasks  Examples of ‘data parallel’ problems: – Counting words in documents – Genome-Wide Association Study – Studying network anomalies  Sample Implementations by PDL – Digital image processing – Bayesian Inference with MCMC – Parallel Bagged Decision Trees Doc1 Doc2 DocM Stem1 Stem2 StemM SQL & R Count1 Count2 Count M Network Interconnect Master Severs Segment Severs
  110. 110. 132© Copyright 2015 Pivotal. All rights reserved. Finding Causal Variants in Lupus Customer Biotech Company Business Problem The customer wants to establish internal data science capabilities: building a culture and acquiring hardware and people to support it Challenges  Customer needs to establish a culture around sharing and analyzing datasets for value  Current in-house technology is unable to support large-scale analysis (e.g. unable to analyze genomics datasets)  Need to learn new paradigms for analyzing data at-scale Solution  Train customer employees on our solution stack, provide one-on-one consulting and run a hackathon  Greatly reduced computation time enabling implementation and results during a 30 hour period – Module previously requiring 30min took only 5sec in using SQL in database  Novel scientific discovery on untouched data – Dataset previously untouched for 2 years due to limited resources – Mine for statistically significant associations between ~400,000 variants for 1000 phenotypes
  111. 111. 133© Copyright 2015 Pivotal. All rights reserved. Processing images and building integrated models at scale
  112. 112. 134© Copyright 2015 Pivotal. All rights reserved. Image Computation Framework Hadoop Sequence File Thousands of Images Image Pre-Processing Features img1 [x1-xM] img2 [x1-xM] imgN [x1-xM] Feature Generation HDFS Map reduce Map reduce One sequence file
  113. 113. 135© Copyright 2015 Pivotal. All rights reserved. Image Computation Framework Hadoop Sequence File Thousands of Images Image Pre-Processing Features img1 [x1-xM] img2 [x1-xM] imgN [x1-xM] Feature Generation HDFS HAWQ/GPDB Map reduce Map reduce Join to additional datasets Proteomics Medical History Variants Additional Datasets Build Models at-Scale SQLOne sequence file
  114. 114. 136© Copyright 2015 Pivotal. All rights reserved. Image Computation Framework Hadoop Sequence File Thousands of Images One sequence file Image Pre-Processing Features img1 [x1-xM] img2 [x1-xM] imgN [x1-xM] Feature Generation HDFS HAWQ/GPDB Map reduce Map reduce Raw Pixels img1 [rgb1-rgbK] img2 [rgb1-rgbK] imgN [rgb1-rgbK] Map reduce Join to additional datasets Proteomics Medical History Variants Additional Datasets Build Models at-Scale SQL
  115. 115. 137© Copyright 2015 Pivotal. All rights reserved. Image Computation Framework Hadoop Sequence File Thousands of Images One sequence file Image Pre-Processing Features img1 [x1-xM] img2 [x1-xM] imgN [x1-xM] Feature Generation HDFS HAWQ/GPDB Map reduce Map reduce Raw Pixels img1 [rgb1-rgbK] img2 [rgb1-rgbK] imgN [rgb1-rgbK] Map reduce PL/X SQL Join to additional datasets Feature Generation Features img1 [x1-xM] img2 [x1-xM] imgN [x1-xM] Proteomics Medical History Variants Additional Datasets Build Models at-Scale SQL
  116. 116. 138© Copyright 2015 Pivotal. All rights reserved. Representing an image in HAWQ HAWQ enables rapid processing of multiple or extremely large images in parallel without memory limitations Source Image: Col Row 0 1 2 0 1 2 0 0 0 1 0 2 1 0 1 1 1 2 2 0 2 1 2 2 col row intsy Structured:
  117. 117. 139© Copyright 2015 Pivotal. All rights reserved. Translating image processing to simple SQL Function Distribution of pixel intensities SQL SELECT intsy, count(*) FROM tbl GROUP BY intsy Output 150, 5 215, 4 HAWQ enables rapid processing of multiple or extremely large images in parallel without memory limitations  No data movement required  Simple SQL queries for data exploration 0 0 0 1 0 2 1 0 1 1 1 2 2 0 2 1 2 2 Source Image: Col Row 0 1 2 0 1 2 col row intsy Structured:
  118. 118. 140© Copyright 2015 Pivotal. All rights reserved. Image Processing Pipeline For Object Counting Original Image name # Cells Tma_001.jpg 359 Tma_002.jpg 1892 Tma_003.jpg 871 … … Smoothing Average over window of pixels Thresholding Select pixels under intensity threshold Cleanup Min/max over window of pixels Object Detection Connected components Object Counting Select components with size filter
  119. 119. 141© Copyright 2015 Pivotal. All rights reserved. Image Computation Framework Hadoop Sequence File Thousands of Images One sequence file Image Pre-Processing Features img1 [x1-xM] img2 [x1-xM] imgN [x1-xM] Feature Generation HDFS HAWQ/GPDB Map reduce Map reduce Raw Pixels img1 [rgb1-rgbK] img2 [rgb1-rgbK] imgN [rgb1-rgbK] Map reduce PL/X SQL Join to additional datasets Feature Generation Features img1 [x1-xM] img2 [x1-xM] imgN [x1-xM] Proteomics Medical History Variants Additional Datasets Build Models at-Scale SQL
  120. 120. 142© Copyright 2015 Pivotal. All rights reserved. A Drug-Centric Data Lake to Enable Drug Discovery Customer A major pharmaceutical company Business Problem Identifying promising drug targets leveraging and integrating the vast datasets available will reduce time and cost to bring a new product to market Challenges • Data for drugs screens across multiple modalities cannot be easily integrated • Current environments cannot support the growing data form high-content screens • Researchers are unable to leverage the entirety of datasets, instead work with aggregates or summaries Solution • Proved that current customer models can be ported, sped up and scaled in the Pivotal environment • Created richer models integrating multiple types of data (genomics, images, etc) • Improve models using the raw, most-granular data • Demonstrate how the availability of tools and data enables scientists to interrogate models and derive a deeper, actionable insight
  121. 121. 143© Copyright 2015 Pivotal. All rights reserved. http://blog.pivotal.io/data-science-pivotal Check out the Pivotal Data Science Blog!
  122. 122. 144© Copyright 2015 Pivotal. All rights reserved. FOR FURTHER INFO, CHECKOUT… • Pivotal Blog @ http://blog.pivotal.io • Pivotal Academy @ https://pivotal.biglms.com
  123. 123. 145© Copyright 2015 Pivotal. All rights reserved. 145© Copyright 2013 Pivotal. All rights reserved. Driving Insights from Data Lakes Manufacturing Demo Matthew Ross & Antonio Petrole Pivotal Data Engineering
  124. 124. 146© Copyright 2015 Pivotal. All rights reserved. Internet of What????
  125. 125. 147© Copyright 2015 Pivotal. All rights reserved. Industrial Internet of Things?
  126. 126. 148© Copyright 2015 Pivotal. All rights reserved. IoT Goes Mainstream According to Gartner, Inc. (a technology research and advisory corporation), there will be nearly 26 billion devices on the Internet of Things by 2020.
  127. 127. 149© Copyright 2015 Pivotal. All rights reserved. GE Doubles Down GE invests in IIoT cloud and creates Predix cloud built upon Pivotal Cloud Foundry. GE estimates that connecting these industrial machines to the IoT could boost global GDP by $10 trillion to $15 trillion in 20 years. McKinsey Global Institute research holds that IoT in general could add $6.2 billion to the global economy by 2025.
  128. 128. 150© Copyright 2015 Pivotal. All rights reserved. Converging Trends Innovation New Data New Processes New Insights The Journey to the Data-Driven Enterprise Data Science and Machine Learning Big Data Internet of Things
  129. 129. 151© Copyright 2015 Pivotal. All rights reserved. IoT Key Workflows Data Flow Management Reliable Infrastructure Enterprise Level Tooling ● High Availability ● Fault Tolerant ● Scalable ● Data Overload ● Normalization ● Multiple Sources and Destinations ● Workflow Orchestration ● Admin Tooling ● Developer Enablement
  130. 130. 152© Copyright 2015 Pivotal. All rights reserved. IoT Key Workflows Data Flow Management Reliable Infrastructure Enterprise Level Tooling ● High Availability ● Fault Tolerant ● Scalable ● Data Overload ● Normalization ● Multiple Sources and Destinations ● Workflow Orchestration ● Admin Tooling ● Developer Enablement
  131. 131. 153© Copyright 2015 Pivotal. All rights reserved. Data Flow Management-Data Overload ● The ability to stream and process massive amounts of data ● Must have a platform that can handle that much data without losing or corrupting any of it
  132. 132. 154© Copyright 2015 Pivotal. All rights reserved. Data Flow Management-Data Normalization and Cleansing ● Organizing Fields to fit into a relational structure ● Adding extra fields or removing unneeded ones
  133. 133. 155© Copyright 2015 Pivotal. All rights reserved. Data Flow Management-Multiple Sources and Destinations ● Stream data from multiple different sources ● Persist it so multiple different destinations ● Process multiple different data formats
  134. 134. 156© Copyright 2015 Pivotal. All rights reserved. IoT Key Workflows Data Flow Management Reliable Infrastructure Enterprise Level Tooling ● High Availability ● Fault Tolerant ● Scalable ● Data Overload ● Normalization ● Multiple Sources and Destinations ● Workflow Orchestration ● Admin Tooling ● Developer Enablement
  135. 135. 157© Copyright 2015 Pivotal. All rights reserved. Reliable Infrastructure-High Availability and Fault Tolerance ● Resources are available under any conditions ● System stays up even if some resources go down
  136. 136. 158© Copyright 2015 Pivotal. All rights reserved. Reliable Infrastructure-Scalability ● Must be able to handle more traffic demand at any time ● Easily process big data workloads
  137. 137. 159© Copyright 2015 Pivotal. All rights reserved. IoT Key Workflows Data Flow Management Reliable Infrastructure Enterprise Level Tooling ● High Availability ● Fault Tolerant ● Scalable ● Data Overload ● Normalization ● Multiple Sources and Destinations ● Workflow Orchestration ● Admin Tooling ● Developer Enablement
  138. 138. 160© Copyright 2015 Pivotal. All rights reserved. Enterprise Level Tooling-Workflows and Tooling ● Manage complex flows of data ● Provide rich User Interface Applications
  139. 139. 161© Copyright 2015 Pivotal. All rights reserved. Enterprise Level Tooling-Developer Enablement ● Full Featured APIs ● Extreme module customization if needed
  140. 140. 162© Copyright 2015 Pivotal. All rights reserved. Are you making the most out of your data?
  141. 141. 163© Copyright 2015 Pivotal. All rights reserved. Bringing it all together
  142. 142. 164© Copyright 2015 Pivotal. All rights reserved. 164© Copyright 2013 Pivotal. All rights reserved. Reporting is nice, but being able to take action is what drives the value of a platform
  143. 143. 165© Copyright 2015 Pivotal. All rights reserved. Predictive Analytics Proactive Monitoring Reactive Maintenance
  144. 144. 166© Copyright 2015 Pivotal. All rights reserved. ETL vs Streaming ● Data is loaded in large batches ● Typically happens once a day ● Analysis can only be done once data is transformed and persisted to data warehouse
  145. 145. 167© Copyright 2015 Pivotal. All rights reserved. ETL vs Streaming ● Continuous, data streams are “listening” for data being emitted from sensors ● Data can be analyzed in stream ● Can be integrated into data driven applications
  146. 146. 168© Copyright 2015 Pivotal. All rights reserved. Reactive Maintenance ● Alert is sent out to someone on factory floor ● Worker must receive alert, and be able to react ● Equipment is down for however long it takes for worker to receive notification and complete repairs.
  147. 147. 169© Copyright 2015 Pivotal. All rights reserved. Proactive Maintenance Workflow ● Manager has Dashboard with all gauges of the system ● Historical Log of Recent Alerts are visible on the Dashboard ● Has the ability to dispatch a worker to investigate a specific line or robot
  148. 148. 170© Copyright 2015 Pivotal. All rights reserved. Predictive Analytics Workflow ● Run Machine Learning and Data Science models ● Incorporate Business Intelligence Tools ● Persist data to Data Lake and run advanced queries
  149. 149. 171© Copyright 2015 Pivotal. All rights reserved. Data Streaming Needs an Agile, Scalable and Fast Solution Data Lake Data Ingestion Business Intelligence Real Time Analytics Mobile App
  150. 150. 172© Copyright 2015 Pivotal. All rights reserved. Ingest Transform Sink SpringXD Spring XD Orchestrates and Automates all the Steps on Data Stream Pipelining HDB PHD Data Lake Extensible Open-Source Fault-Tolerant Horizontally Scalable
  151. 151. 173© Copyright 2015 Pivotal. All rights reserved. INGEST / SINK PROCESS ANALYZE • No coding required • Dozens of built-in connectors • Seamless integration with Kafka, Sqoop • Create new connectors easily using Spring • Call Spark, Reactor or RxJava • Built-in configurable filtering, splitting and transformation • Out-of-box configurable jobs for batch processing • Import and invoke PMML jobs easily • Call Python, R, Madlib and other tools • Built-in configurable counters and gauges Spring XD State of the Art Data Pipeline Automation
  152. 152. 174© Copyright 2015 Pivotal. All rights reserved. Pivotal HDB Hadoop Native SQL • Exceptional Hadoop Native SQL Performance • No compatibility risks to SQL developers or SQL BI tools and applications • Support query roll-ups, dynamic partitions and joins • Massive MPP scalability to petabytes • On premise or on the cloud • Scale your cluster out, not up • World class parallel loading and unloading • Fast performance for complex and advanced data analytics • Integrated with MADLib for advanced machine learning • Powerful Cost-based Query Optimizer
  153. 153. BUILT FOR THE SPEED OF BUSINESS

×