Contenu connexe Similaire à Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action (20) Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action1. Pivotal Data Scientists on
the Front Line: Examples of
Data Science in Action
Getting to Know Your Customer with
Big & Fast Data
Pivotal Data Science Team
© Copyright 2013 EMC Corporation. All rights reserved.
1
2. Welcome – It’s a Pleasure to Meet You
• The Launch of Pivotal
• Pivotal Data Science Team
• Getting to Know Your Customer
- Meet your customer: Build Models
- Learn more about your customer: More Data
- Adapt to your customer: Dynamic Models
• Let’s Get Started: Pivotal Data Science Labs
• Q&A
© Copyright 2013 EMC Corporation. All rights reserved.
2
4. Pivotal, The New EMC Spin-out
Pivotal is building
a new platform for a new era
This platform enables customers
to build a new class of
applications
That leverage Big and Fast Data
All this with the power of cloud
independence
© Copyright 2013 EMC Corporation. All rights reserved.
Private
Cloud
Public
Cloud
4
5. Introducing the Pivotal Stack
Data-Driven
Application
Development
Pivotal Data
Science Labs
Cloud
Application
Platform
Data &
Analytics
Platform
Virtualization
Cloud Storage
© Copyright 2013 EMC Corporation. All rights reserved.
5
6. Pivotal Services: Rapid Time to Value
Pivotal Labs:
Quickly create and
deploy new
applications
• Proven methodology to
remove risk and
accelerate results
© Copyright 2013 EMC Corporation. All rights reserved.
Pivotal Data
Science Labs:
Open Source
Support:
A proven data science
practice to accelerate
analytics projects
Collaborative and
customer-driven open
source support,
services and codevelopment
• Drive business value
through data analytics
6
8. Tell Me About Data Science
What it is:
–
–
–
–
–
Data preparation
Data exploration and visualization
Feature creation based on data and domain knowledge
Quantitative modeling & model validation
Scoring data
What it is not:
– A set of tools
– Application development
© Copyright 2013 EMC Corporation. All rights reserved.
8
9. Platform-Driven Data Science Paradigm
Shift
1.
2.
Rapid ingestion of new data
3.
Re-use of valuable data
4.
Faster model building
5.
Scalable advanced modeling
6.
Faster model refreshing
7.
© Copyright 2013 EMC Corporation. All rights reserved.
Modeling on more data
Faster data scoring
9
10. Pivotal Data Science Knowledge Development
© Copyright 2013 EMC Corporation. All rights reserved.
10
11. Data Science Strategy
Pivotal Data
Science Labs
Point Model
Development
Multiple Model
Development
Transformation to
“Predictive Enterprise”
© Copyright 2013 EMC Corporation. All rights reserved.
12
12. Getting to Know Your
Customer
Deeper Insights
With Data Science
© Copyright 2013 EMC Corporation. All rights reserved.
13
13. More Data Science Deeper Insights
Meet Your
Customer
Learn More
About Your
Customer
Adapt to
Your
Customer
Build
Models
More Data
Dynamic
Models
© Copyright 2013 EMC Corporation. All rights reserved.
14
14. The New Normal: “An Audience of One”
DATA DEVICES
Individuals
Analytic
Services
Employers
Advertising
Information
Brokers
AD
AGENCY
Marketers
INTERNET
Websites
Data
Users/Buyers
RETAIL
GOVERNMENT
Data
Aggregators
Catalog
Co-ops
Media
Media
Archives
Credit
Bureaus
PHONE/
TV
List
Brokers
Government
CONTENT
Delivery
Services
Banks
© Copyright 2013 EMC Corporation. All rights reserved.
15
16. More Data Science Deeper Insights
Meet Your
Customer
Learn More
About Your
Customer
Adapt to
Your
Customer
Build
Models
More Data
Dynamic
Models
© Copyright 2013 EMC Corporation. All rights reserved.
17
17. Who Are Our Customers?
• One way of learning about
customers is to divide them
into characteristic groups
• This is called segmentation
• Let’s take a look at a
segmentation exercise
Pivotal did with a large
medical insurance company…
© Copyright 2013 EMC Corporation. All rights reserved.
18
18. What Did We Have to Work With?
Product Sales
Population Served
© Copyright 2013 EMC Corporation. All rights reserved.
Claims Data
Consumer Data
Provider Information
19
19. So What Did We
Do With this Data?
Before – Random Clusters
© Copyright 2013 EMC Corporation. All rights reserved.
After – Cohesive Clusters
20
20. What Was the Outcome?
New Clinics
© Copyright 2013 EMC Corporation. All rights reserved.
Neighborhood
Clinics
Pirate Clinics
Established
Clinics
21
22. Summary: Get to Know Your Customer by Building DataDriven Models
Objective:
•Improve understanding of customer
Data:
•Existing EDW sources
•New big data sources that capture customer demographics, such as the publicly
available US Census
Data Science Methodology:
•Segmentation via k-means clustering
Business Impact & Improvement:
•Dramatically increase familiarity with makeup and behavior of customer base
•Drive targeted marketing efforts
•Lay foundation for higher-quality future models
© Copyright 2013 EMC Corporation. All rights reserved.
23
23. More Data Science Deeper Insights
Meet Your
Customer
Learn More
About Your
Customer
Adapt to
Your
Customer
Build
Models
More Data
Dynamic
Models
© Copyright 2013 EMC Corporation. All rights reserved.
24
24. Churn Models for Telecom Industry
Goal
– Identify and prevent customers who are likely to churn.
Challenges
–
–
–
–
Cost of acquiring new customers is high
Recouping cost of customer acquisition high if customer is not retained long enough
Lower barrier to switching subscribers
With mobile number portability, barrier to switching even lower
Good News
– Cost of retaining existing customers is lower!
© Copyright 2013 EMC Corporation. All rights reserved.
25
25. Structured Features for Churn Models
The problem is extensively studied with a rich set of approaches in the
literature
Device
Texting Stats
Call Stats
Rate Plans
Customer
Demographics
These features are great, but the models soon hit a plateau with
structured features!
© Copyright 2013 EMC Corporation. All rights reserved.
26
26. Blending the Unstructured with the
Structured
What other sources of previously untapped data could we use ?
Are our customers happy ? Where ? What segments ?
What are the common topics in their conversations online ?
© Copyright 2013 EMC Corporation. All rights reserved.
27
27. Sentiment Analysis and Topic Models
BETTER PREDICT LIKELIHOOD
TO CHURN
Unstructured Data
External
Internal
Sentiment Analysis
Engine
(Classifier)
Topic Engine
(LDA)
Structured Data: EDW
© Copyright 2013 EMC Corporation. All rights reserved.
Topic Dashboard
28
28. Topic Clouds from Twitter - An Example
Baby shower & Coupons: 13%
Convenience:
26%
© Copyright 2013 EMC Corporation. All rights reserved.
Promotions, deals: 17%
Misc: 32%
Store
experience:
13%
29
29. Summary: More Data to Drive Additional Customer Insights
Objective:
•Improve accuracy of churn models by blending structured features with
unstructured text
Data:
•Existing structured features (call data records, device type, rate plans etc.)
•Call center memos
Data Science Methodology:
•Sentiment Analysis and Topic Modeling
Business Impact & Improvement:
•Achieved 16% improvement in ROC curve for Churn prediction
•Topic Models automatically identified common themes in call center memos
•Laid foundation for Text Analytics
© Copyright 2013 EMC Corporation. All rights reserved.
30
30. More Data Science Deeper Insights
Meet Your
Customer
Learn More
About Your
Customer
Adapt to
Your
Customer
Build
Models
More Data
Dynamic
Models
© Copyright 2013 EMC Corporation. All rights reserved.
31
31. State of Data at Telco Company
Customer Segments
Multi-Gadget Families
New Data Sources
Affluent Matures
Internet Deep Packet
Inspection
Thrifty Families
TV Consumption (Linear)
High Tech Singles
Video On Demand
Consumption
Budget Singles
© Copyright 2013 EMC Corporation. All rights reserved.
Seniors
32
32. Understanding Subscriber Behavior
What is the level of engagement with
Client’s products (TV, VOD, Internet)?
Native Services
Internet
Video On Demand
TV
Internet Devices
What are the patterns of device usage
behavior?
What is the level of OTT engagement,
by segment, and by bandwidth?
© Copyright 2013 EMC Corporation. All rights reserved.
OTT Services
33
33. Newly Identified Behavior-Based
Segments
Moderates
iPhone Heavy
OTT & Data Heavyweights
Subscribers
Android Heavy
Portable OTT Entertainment Seekers
iPad Heavy
In-Home OTT Entertainment Seekers
VOD Heavy
In-Home Native Content Seekers
TV Heavy
© Copyright 2013 EMC Corporation. All rights reserved.
34
34. Going Further: Crossing Behavior-Based
Segments on Existing Customer Segments
Existing Segments
Newly Discovered Usage-Based
Segments
Moderates
Multi-Gadget Families
OTT & Data Heavyweights
Affluent Matures
In-Home OTT Entertainment Seekers
Thrifty Families
Portable OTT Entertainment Seekers - iPhone Heavy
High Tech Singles
Portable OTT Entertainment Seekers - Android Heavy
Budget Singles
Portable OTT Entertainment Seekers - iPad Heavy
Seniors
In-Home Native Content Seekers - VOD Heavy
In-Home Native Content Seekers - TV Heavy
Customized Micro-Segments!
© Copyright 2013 EMC Corporation. All rights reserved.
35
35. Driving New Business Value by Leveraging
Data Science
Upsell and Cross-Sell
© Copyright 2013 EMC Corporation. All rights reserved.
New Product Offerings
Data Monetization
36
36. Summary: Adapt to Your Customer with More Data Science
Objective:
•Combine existing models with new models derived from big data sources
Data:
•Existing EDW sources
•New big data sources that capture subscriber behavior, including machine
generated sources such as DPI & VOD set-top box data
Data Science Methodology:
•Micro-segmentation via clustering
Business Impact & Improvement:
•Reduce operational and financial dependence on survey data
•Lay foundation for data monetization
•Generate tailored upsell & cross-sell opportunities
•Real, customer behavior driven guidance for product & app development
© Copyright 2013 EMC Corporation. All rights reserved.
37
39. Pivotal Data Science Labs: Packaged Services
LAB PRIMER
LAB 100
LAB 600
LAB 1200
• Analytics Roadmap
• On-site MPP
Analytics
Training
• Prof. services
• Prof. services
• Data science
model building
• Data science
model building
• Ready-to-deploy
model(s)
• Ready-to-deploy
model(s)
(2-Week Roadmapping)
• Prioritized
Opportunities
• Architectural
Recommendations
(Analytics Bundle)
• Analytics tool-kit
• Quick insight
(2 weeks)
(6-Week Lab)
(12-Week Lab)
*Pivotal platform priced separately
© Copyright 2013 EMC Corporation. All rights reserved.
40
40. Thank You
Do you have any questions?
© Copyright 2013 EMC Corporation. All rights reserved.
41
41. Pivotal Sessions at EMC World
Session
Presenter
Dates/Times
The Pivotal Platform: A Purpose-Built Platform for Big-DataDriven Applications
Josh Klahr
Tue 5:30 - 6:30, Palazzo E Wed
11:30 - 12:30, Delfino 4005
Pivotal: Data Scientists on the Front Line: Examples of
Data Science in Action
Noelle Sio
Tue 10:00 - 11:00, Lando 4205
Thu 8:30 - 9:30, Palazzo F
Pivotal: Operationalizing 1000-node Hadoop Cluster –
Analytics Workbench
Clinton Ooi
Bhavin Modi
Tue 11:30 - 12:30, Palazzo L Thu
10:00- 11:00 am, Delfino 4001A
Pivotal: for Powerful Processing of Unstructured Data For
Valuable Insights
SK
Krishnamurthy
Mon 4:00 - 5:00, Lando 4201 A
Tue 4:00 - 5:00, Palazzo M
Pivotal: Big & Fast data – merging real-time data and deep
analytics
Michael
Crutcher
Mon 1:00 - 2:00, Lando 4201 A
Wed 10:00 - 11:00, Palazzo M
Pivotal: Virtualize Big Data to Make The Elephant Dance
June Yang
Dan Baskette
Mon 11:30 - 12:30, Marcello
4401A Wed 4:00 - 5:00, Palazzo
E
Hadoop Design Patterns
Don Miner
Mon 2:30 - 3:30, Palazzo F Wed
8:30 - 9:30, Delfino 4005
© Copyright 2013 EMC Corporation. All rights reserved.
42