Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Getting to What Matters:
Accelerating Your Path Through the Big
Data Lifecycle with CSC and Hortonworks

Presenters
•  John Kreisa (@marked_man)
VP Strategic Marketing, Hortonworks
Over 20 years in data management as a developer
and a marketer
•  Tim Gasper (@TimGasper)
Global Offerings Manager, CSC
Led product for Infochimps for 4 years, now called the
CSC Big Data PaaS; leads product/offering management
for CSC Big Data & Analytics

Traditional systems under pressure
Challenges
•  Constrains data to app
•  Can’t manage new data
•  Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional

Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for managing large
volumes of high velocity and variety of data
•  Built by Yahoo! to be the heartbeat of its ad & search business
•  Donated to Apache Software Foundation in 2005 with rapid adoption by large web
properties & early adopter enterprises
•  Incredibly disruptive to current platform economics
Traditional Hadoop Advantages
ü  Manages new data paradigm
ü  Handles data at scale
ü  Cost effective
ü  Open source
Application
Storage
HDFS
Batch Processing
MapReduce

SYSTEMS
INTEGRATOR

OPERATIONAL
TOOLS

DEV
&
DATA
TOOLS

INFRASTRUCTURE

Hadoop is deeply integrated in the data centerSOURCES
EXISTING

Systems

Clickstream
Web
&Social
GeolocaDon
Sensor
&

Machine

Server
Logs
Unstructured

DATASYSTEM
RDBMS
EDW
MPP

APPLICATIONS

Deep Partnerships
Hortonworks engages
in deep engineered relationships
with the leaders in the data center,
such as HP, Microsoft, Red Hat,
SAP, SAS & Teradata
Broad Partnerships
Over 600 partners work with us to
certify their applications to work with
Hadoop so they can extend big data
to their users
HDP 2.2
Governance
&Integration
Security
Operations
Data Access
Data Management
YARN

SYSTEMS
INTEGRATOR

OPERATIONAL
TOOLS

DEV
&
DATA
TOOLS

INFRASTRUCTURE

CSC and the Modern Data Architecture
Modern Data Architecture
•  Enable applications to have access to
all your enterprise data through an
efficient centralized platform
•  Supported with a centralized approach
governance, security and operations
•  Versatile to handle any applications and
datasets no matter the size or type
CSC Extends Hadoop’s Reach
•  Allows for multiple deployment options -
including on-premise, managed or Big
Data as a Service.
•  CSC’s global consulting services can
help you architect, develop and
implement your big data strategy,
analytics, integrations, and platforms
Clickstream
Web

&
Social

GeolocaDon
Sensor

&
Machine

Server

Logs

Unstructured

SOURCES
Existing Systems
ERP
CRM
SCM

ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch Batch
MPP
EDW

Hadoop Driver: Cost optimization
Archive Data off EDW
Move rarely used data to Hadoop as active
archive, store more data longer
Offload costly ETL process
Free your EDW to perform high-value functions
like analytics & operations, not ETL
Enrich the value of your EDW
Use Hadoop to refine new data sources, such as
web and machine data for new analytical context
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
HDP helps you reduce costs and optimize the value associated with your EDW
ANALYTICSDATASYSTEMS
Data
Marts
Business
Analytics
Visualization
& Dashboards
HDP 2.2
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data,
Deeper Archive
& New Sources
Enterprise Data
Warehouse
Hot
MPP
In-Memory
Clickstream
Web

&
Social

GeolocaDon
Sensor

&
Machine

Server

Logs

Unstructured

Existing Systems
ERP
CRM
SCM

SOURCES

Single View
Improve acquisition and retention
Predictive Analytics
Identify your next best action
Data Discovery
Uncover new findings
Financial Services
New Account Risk Screens Trading Risk Insurance Underwriting
Improved Customer Service Insurance Underwriting Aggregate Banking Data as a Service
Cross-sell & Upsell of Financial Products Risk Analysis for Usage-Based Car Insurance Identify Claims Errors for Reimbursement
Telecom
Unified Household View of the Customer Searchable Data for NPTB Recommendations Protect Customer Data from Employee Misuse
Analyze Call Center Contacts Records Network Infrastructure Capacity Planning Call Detail Records (CDR) Analysis
Inferred Demographics for Improved Targeting Proactive Maintenance on Transmission Equipment Tiered Service for High-Value Customers
Retail
360° View of the Customer Supply Chain Optimization Website Optimization for Path to Purchase
Localized, Personalized Promotions A/B Testing for Online Advertisements Data-Driven Pricing, improved loyalty programs
Customer Segmentation Personalized, Real-time Offers In-Store Shopper Behavior
Manufacturing
Supply Chain and Logistics Optimize Warehouse Inventory Levels Product Insight from Electronic Usage Data
Assembly Line Quality Assurance Proactive Equipment Maintenance Crowdsource Quality Assurance
Single View of a Product Throughout Lifecycle Connected Car Data for Ongoing Innovation Improve Manufacturing Yields
Healthcare
Electronic Medical Records Monitor Patient Vitals in Real-Time Use Genomic Data in Medical Trials
Improving Lifelong Care for Epilepsy Rapid Stroke Detection and Intervention Monitor Medical Supply Chain to Reduce Waste
Reduce Patient Re-Admittance Rates Video Analysis for Surgical Decision Support Healthcare Analytics as a Service
Oil & Gas
Unify Exploration & Production Data Monitor Rig Safety in Real-Time Geographic exploration
DCA to Slow Well Declines Curves Proactive Maintenance for Oil Field Equipment Define Operational Set Points for Wells
Government
Single View of Entity CBM & Autonomic Logistic Analysis Sentiment Analysis on Program Effectiveness
Prevent Fraud, Waste and Abuse Proactive Maintenance for Public Infrastructure Meet Deadlines for Government Reporting
Hadoop Driver: Advanced analytic applications

Hadoop Driver: Enabling the data lakeSCALE
SCOPE
Data Lake Definition
•  Centralized Architecture
Multiple applications on a shared data set
with consistent levels of service
•  Any App, Any Data
Multiple applications accessing all data
affording new insights and opportunities.
•  Unlocks ‘Systems of Insight’
Advanced algorithms and applications
used to derive new value and optimize
existing value.
Drivers:
1.  Cost Optimization
2.  Advanced Analytic Apps
Goal:
•  Centralized Architecture
•  Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight

Case Study: 12 month Hadoop evolution at TrueCar
DataPlatformCapabilities
12 months execution plan
June 2013
Begin
Hadoop
Execution
July 2013
Hortonworks
Partnership
May ‘14
IPO
Aug 2013
Training
& Dev
Begins
Nov 2013
Production
Cluster
60 Nodes
2 PB
Jan 2014
40% Dev
Staff
Proficient
Dec 2013
Three
Production
Apps
(3 total)
Feb 2014
Three More
Production
Apps
(6 total)
12 Month Results at TRUECar
•  Six Production Hadoop Applications
•  Sixty nodes/2PB data
•  Storage Costs/Compute Costs
from $19/GB to $0.23/GB
“We addressed our data platform capabilities
strategically as a pre-cursor to IPO.”

CSC Big Data & Analytics
•  Fastest Time to Value
Proven methodologies and customer
success stories achieving insight in 30
days and production rollout in 90.
•  Industry Analytics Expertise
Experience combining horizontal analytics
approaches and techniques with industry
and vertical specialization.
•  Global Solutions Integrator
Worldwide delivery capabilities and
experience with a broad set of both open
and proprietary technologies and vendors.
•  End-to-End Consulting
Taking customers on a journey from
strategy and roadmap, to business and
technology transformation, to ongoing
SLA management and as-a-Service.

CSC Big Data Platform as a Service
Big Data Platform as a Service
Flexible Deployment Options
Hadoop Queries Streams
CSC Command and Control
MongoDB
Elasticsearch
Storm
Kafka
PostgreSQL
PostGIS
Deployment
Center
Operations
Center
Support
Center
Application
Center
Knowledge
Center
Public
Cloud
Virtual
Private Cloud
Enterprise
Private Cloud
Dedicated
Cluster
Enterprise Grade Security
Access
Control
Compliance
Support
Perimeter
Security
Activity
Monitoring
Audit
Logging Encryption
Malware
Protection
Hardened
OS
DataStax
TitanDB
ETL Data Transformation Business Intelligence Data Mining Advanced Analytics Geolocation
Hive w/ Tez
HBase
Accumulo
HDFS, YARN, MR, Spark, …

Across the Industries, Clients See the Possibilities
Financial Services Utilities Transportation Health and Life Sciences
Retail Telecommunications
•  Fraud detection
•  Risk management
•  360° view of the
customer
•  Real-time route
optimization based on
traffic and weather
•  Maintenance optimization
and asset tracking
•  360° view of the
customer
•  Click-stream analysis
•  Real-time promotions
Law Enforcement
•  Real-time multimodal
surveillance
•  Situational awareness
•  Cybersecurity detection
•  CDR processing
•  Churn prediction
•  Geomapping/marketing
•  Network monitoring
•  Epidemic early warning
system
•  ICU monitoring
•  Remote healthcare
monitoring
•  Analysis of weather
impact on power
generation
•  Transmission monitoring
•  Smart grid management
•  Predictive maintenance
•  Real-time parts flow
monitoring
•  Product configuration
planning
Manufacturing

But They Struggle With Consistent Challenges
•  Data complexity
•  Robust and scalable service
•  Speed of stand-up
1. Setting up and operating a big data and analytics platform
2. Applying the right data science
3. Integrating insights into their business processes
•  Skills shortage
•  Skills retention
4. Identifying and managing big data skills

Time to Value, Time to Next Iteration
Business
Discovery
Info
Discovery
Logical Data
Model
Physical Data
Model
System
Staging
Data Ingestion,
Transformation, ETL
Application
Development
Analytics
Production
Staging
Data Warehouse Project
12-24 Months to Reach Production
Big Data Project
3-6 Months to Reach Production
Prod.
Stag.
Business
Discovery
Info
Discovery
Sys.
Stag.
Initial
Data
Ingest
Schema on Read
Analytics
App Dev
Schema
on Read
Analytics
App Dev
Schema
on Read
Analytics
App Dev
Schema
on Read
Analytics
App Dev
Schema
on Read
Analytics
App Dev
Schema
on Read
Analytics
App Dev

Following the Big Data Maturity Lifecycle and…
•  Determining use cases
•  Art of the possible
•  Technology evaluation &
understanding
•  Validate business value
hypothesis with real data
•  Quick win, low hanging
fruit, rapid initial phase
•  Implement one key
transformation or insight
into business process
•  Longer project timelines
and robust ROI tracking
•  Expand to other key use
cases for a big data
enabled department of
business function
•  Incorporate
complementary tools and
technology for a broader
solution
•  Shift from a department
or function focus to a
cross-org focus
•  Introduce insights from
across silos
•  Implement self-service
capabilities for analytics
and data integration
•  Provide marketplaces,
catalogs, and
collaboration zones

… Leveraging an App Reference Design Framework
It’s all about the apps.

Proof of Value: Food & Hospitality Retailer
This Food & Hospitality Retailer has a footprint of over 650 regional hotels, 2,800 coffee shops, and a number of restaurant chains. CSC
provides the infrastructure, data platform, and analytics that uncovers revenue opportunities in customer web interactions.
•  The client wanted to quickly evaluate the use of big
data and the value that it brings as it relates to
identifying new business opportunities
•  Ease of use was a key need in making insights and
reporting more accessible to analysts… and
increasing the speed with which they could analyze
•  Time to market was a key factor in the decision to
implement a comprehensive big data platform. The
client realized:
–  A bare platform would not be easy
to manage
–  Their staff does not possess the skills to operate a
bare platform
–  They needed to focus on the
big data applications, rather than
the platform
•  CSC designed and configured the solution, built
and deployed it in the cloud, and developed ETL
flows to transport web activity data within
90 days:
–  Core platform (BDPaaS) leveraging Hortonworks
Data Platform, including Hive with Tez
–  Aggregating lots of different data sources to create
one massive web log data set
–  Adding data science algorithms to clean up data for
better insights
–  Providing Pentaho Business Analytics as a
comprehensive reporting and dashboard suite for
insight presentation
•  CSC managed the infrastructure, platform
components, and data flows, in addition to
providing continued support/consultation services
to the client
•  The client is generating insights on how customers
interact with their website, and improving their
services for happier customers and more
streamlined business:
–  Faster path to ROI with both tech and services
–  Creating a real-time customer insights dashboard
and set of reports
–  Ability to prove the value of big data internally
through the mining of data and generation of insights
and reports for various teams
–  Scalability to more data sources and use cases,
including plans for mobile application analytics and
operational metrics, as well as operational business
analytics combining internal and external data
sources
SOLUTIONCHALLENGE RESULTS

Business Unit Strategy: Network Rail
Network Rail manages the most of the rail infrastructure across Great Britain, responsible for control and maintenance of over 2,500 railway
stations, 20,000 miles of track, and 40,000 bridges and tunnels. CSC provides a data and analytics hub for massive amounts of imagery
and analog track monitoring data.
•  Network Rail needed a platform that could not only
store, but also analyze petabytes of data over the
long-term:
–  Track imagery and video data captured via drones
and cameras
–  Vibration data captured via maintenance trains
–  Other forms of large file size analog data crossed
with operational, structured data sets
•  Network Rail wanted to implement the solution
quickly, and ramp up data volumes at a fast pace
•  Goal of leveraging combined services to assist with
loading data, managing the underlying
infrastructure, and working with and analyzing the
data
•  CSC designed and configured the solution, built
and deployed it in the cloud, and developed ETL
flows to import massive amounts of bulk data on an
ongoing basis
–  Core platform (BDPaaS) leveraging Hortonworks
Data Platform, including Hive with Tez
•  CSC’s platform integrated with ESRI ArcGIS for Big
Data geolocation analysis features including
geotagging and geo tiles
•  CSC managed the infrastructure, platform
components, and data flows, in addition to
providing continued support/consultation services
to the client
•  Network Rail is generating insights on how to
prioritize in near real-time the improvement and
maintenance of the massive railway track and
infrastructure footprint
–  Advanced analytics of analog data, including
geolocation capabilities
–  Ability to handle the scale required by the massive
amount of data under management and data growth
–  Complete transformation of a business unit’s
analytics capability on track for success in less than
12 months
SOLUTIONCHALLENGE RESULTS

Question & Answer session will be conducted electronically,
using the panel to the right of your screen
Get started with Hortonworks Sandbox
http://hortonworks.com/sandbox
Follow us:
@hortonworks
CSC Big Data Maturity Survey
http://www.csc.com/big_data_index
Learn
More
@CSCNews
Next Steps
CSC Big Data Home
http://www.csc.com/big_data

Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

Similaire à Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks (20)

Plus de Hortonworks

Plus de Hortonworks (20)

Dernier

Dernier (20)

Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

Notes de l'éditeur