Presentation to Analytics Network of the OR Society Nov 2020
20151008 REx Predictive presentation v 1 0 - distributed
1. Information and
Data Management
What can Predictive Modelling do
for your business?
Jefferson Lynch, John McConnell
8th October 2015, Royal Exchange
Analytics and
Data Management
2. Intended audience and aims
Who is this intended for?
Business people who want to understand what
practical difference predictive modelling can offer to
their organisation, and what’s involved.
By the time you leave you should…
Understand the benefits and the overall process
involved.
Be familiar with some common problems where
modelling is used, and some modelling approaches.
Be able to assess your organisational gaps and so
what help you may need.
12/10/2015 2Copyright Red Olive 2015
3. Why you should be interested:
business context
12/10/2015 3Copyright Red Olive 2015
The business interest in using data to tackle
business problems has changed:
Not just structured data, reports and dashboards
to guide solutions to defined performance
problems…
… but also discovery of new patterns in diverse
data to address much bigger questions and
problems.
4. What is predictive analytics?
12/10/2015 4Copyright Red Olive 2015
“Predictive analytics is an area of data mining that
deals with extracting information from data and
using it to predict trends and behaviour patterns.”
(Wikipedia.org)
It can be applied to any type of unknown, whether
past, present or future.
The core idea is to capture relationships between
predictor variables and known outcomes from past
occurrences in a “model”, and then use those
relationships to predict unknown outcomes.
The accuracy depends greatly on the quality of both
the assumptions made and the data available.
5. How is predictive modelling carried
out and where?
12/10/2015 5Copyright Red Olive 2015
Predictive modelling environments:
Our tools of choice are SPSS Modeler and Statistics,
another common general platform is SAS and there are
several others.
Open source modelling (e.g. R) is popular but needs more
expert knowledge, there’s a productivity gain from
modelling software.
Some areas of usage:
Customer intimacy
Optimise capital deployment
Detect and mitigate threats
Many others…
6. Red Olive’s framework for predictive
modelling
12/10/2015 6Copyright Red Olive 2015
Illustration of modelling process
Business data for
analytics
1 Clarify problem,
create multiple
solutions
2 Work out data
needed to solve
the problem
4 Prepare data for
solution modelling
5 Develop
solution models
6 Evaluate results
7 Deploy live
model
3 Source and
capture rich data
(Refine)
(Want to
re-use?)
7. Understanding the problem and the
data
Clarify the business
problem
Does the data support
the solution?
12/10/2015 7Copyright Red Olive 2015
Business data
for analytics
1 Clarify problem,
create multiple
solutions
2 Work out data
needed to solve
the problem
4 Prepare data
for solution
modelling
5 Develop
solution models
6 Evaluate
results
7 Deploy live
model
3 Source and
capture rich
data
(Refine)
(Want to
re-use?)
8. Clarify the business problem
Copyright Red Olive 2015
etc…
Loan
applications
Person
OMG Compare
Moneysupermarket
Websites
12/10/2015 8
What’s the big
idea?
More into the
funnel?
Overall volume?
An optimised
mix?
Higher
conversion of
those who are
there already?
9. Does the data support the solution?
Copyright Red Olive 2015
Loan
applications
Person
OMG Compare
Loan Application
A = Agreement
Go Compare
No behavioural
data from web
analytics was
available.
? In future may be
able to link with
other in-house data
to enable e.g. loan
consolidation?
12/10/2015 9
Level 1
search
Level 2
search
Level 3
search
Moredataavailabletouse
formodelling…
Morelikelytoapply(andbe
successful?)
10. Modelling business solutions
Where is predictive
modelling typically
applied and what are
the benefits?
What are some of the
main techniques used?
12/10/2015 10Copyright Red Olive 2015
Business data
for analytics
1 Clarify problem,
create multiple
solutions
2 Work out data
needed to solve
the problem
4 Prepare data
for solution
modelling
5 Develop
solution models
6 Evaluate
results
7 Deploy live
model
3 Source and
capture rich
data
(Refine)
(Want to
re-use?)
12. Can we profile applicants to find interesting segments (a
“segment” means a group of people with certain things in
common)?
Could we then target certain segments with specific offers for
them?
Approach: used cluster modelling to identify some potentially
interesting segments
23%
77%
Apply Don't apply
People who progress to apply
Copyright Red Olive 201512/10/2015 12
13. 23%
53%
0%
10%
20%
30%
40%
50%
60%
All visitors Profile 1
Profile – Segment 1
• 39 or younger
• In a job for over 24 months and less than 10
years
• Looking for a loan term between 12 and 35
months
When a visitor fitting this profile comes to the
site there is a 53% chance they will make an
application
Do the available lenders have products that
match them?
Example segment 1
Copyright Red Olive 201512/10/2015 13
14. What skills do you need to do this?
Platform or coding?
Copyright Red Olive 201512/10/2015 14
As we explore we
generate many models,
keep only a few: Easier
to manage on a
platform.
Platform also easier to keep track of models, data
sets, parameters…
Also valuable when have a team of people working
together, needing co-ordination.
16. Behavioral data
- Orders
- Transactions
- Payment history
- Usage history
Descriptive data
- Attributes
- Characteristics
- Self-declared info
- (Geo)demographics
Attitudinal data
- Opinions
- Preferences
- Needs & Desires
Interaction data
- E-Mail / chat transcripts
- Call center notes
- Web Click-streams
- In person dialogues
“Traditional”
High-value, dynamic
- source of competitive differentiation
Who? What?
Why?How?
People/Customer data types
12/10/2015 Copyright Red Olive 2015 16
(*Source: IBM)
17. Modelling business solutions
The client wants to
understand core visitor
segments:
Their customer journeys
Their value
So the web site (and other
channels) can be re-
architected to better service
those requirements
The framework allow us to
enrich the behavioural data
with descriptive/attitudinal
and other data
In this example e-commerce
data
12/10/2015 17Copyright Red Olive 2015
18. Why do they visit the
site and what do they
think of it?
Who visits the site? What do they do on the
site?
12/10/2015 Copyright Red Olive 2015 18
The framework in action
19. 12/10/2015 Copyright Red Olive 2015 19
Example segment: “Happy trackers”
• Happy Trackers mainly use the site
for Track and Trace and little else.
• They tend to have a stronger
business slant and be slightly older
than the average.
• They are not heavy users of the
site and individual visits are
relatively light and narrow.
• However they are happy with
what they do and they rate the site
functionality the best out of all the
segments.
21. 0
200
400
600
800
1,000
0 2 4 6 8 10 12 14 16 18
OHW
Domestic PAFfers
Regular posters
Anxious trackers
Hobbyists
Frequent finders
Cottage Industrialists
Virgin posters
Number of visits
Average time on site per visit
Size of bubble reflects size of segment
12/10/2015 Copyright Red Olive 2015 21
Different segments have different
styles of engagement
23. Optimising capital: utility
company example
Aim:
Identify from the data those business processes that most
strongly influence customer satisfaction (CSAT, Net Promoter
Score…).
Use the results to influence decisions regarding capital
investment.
Approach:
1. Are the variations in CSAT over time significant?
2. Given limited resource for investigation, assess scale of
opportunity in a number of process areas and focus
investigation.
3. For the target processes, identify key driver variables and
attempt to calculate linkage with CSAT scores.
12/10/2015 Copyright Red Olive 2015 23
24. Measuring CSAT: last 12 weeks and
95% confidence
12/10/2015 24Copyright Red Olive 2015
Message: short-
term weekly
movement is
inconclusive
Now12 weeks ago
25. Measuring CSAT: What happened
between March and April 2014?
12/10/2015 25Copyright Red Olive 2015
Message: There
has been a
notable shift in
overall CSAT since
April 2014 – was
there some
significant event?
Now40 weeks ago
26. 12/10/2015 26Copyright Red Olive 2015
Proxy variable example: using SLA
compliance when time unavailable
SLA current week SLA previous week SLA comp 2 weeks prior
Mean CSAT 0.217 0.398 -0.039
Median CSAT 0.2 0.415 -0.031
1 Scores -0.266 -0.395 -0.002
5 Scores -0.013 0.161 -0.077
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
AxisTitle
Correlations – last 40 weeks
* Indicates that the correlation is statistically significant at the 95% level
*
*
*
28. Aims:
Identify areas most at risk of sewer flooding, the
underlying factors, and changing risk over time.
Better prioritise investigations, sewer cleansing and
repairs.
Reduce the number of sewer flooding incidents in the
most cost effective way.
Increase confidence in the level of capital maintenance
expenditure required.
28
Sewer Flooding Risk Model
Copyright Red Olive 201512/10/2015
29. Variable risk increasing
over time i.e. risk is
greater as problems
remain unattended over
time
Variable risk
increasing over time
= risk is becoming
more recent
Variable risk decreasing
over time i.e. risk
reduces as problems are
fixed by the maintenance
teams
29
Risk Model based on 365 days history
Risk Model based on 90 days history
Risk Model based on 30 days history
High Risk
Low Risk
Tracking Sewer Flooding Risk
Copyright Red Olive 201512/10/2015 29
31. Fraud detection
Here we use the term “fraud” quite loosely, to include non-
compliance and payment errors as well as abuse.
Traditional detection techniques are based on a set of
business rules that fraudsters learn and adapt to; using
analytics is one way to combat that.
Detecting fraud in a high-volume transactional setting is
different from detecting fraud in a one-off, often very high
value setting (e.g. insider trading). We’ll look at the former.
12/10/2015 Copyright Red Olive 2015 31
33. Predict the expected value for a claim,
compare that with the actual value.
Those cases that fall far outside the expected
range should be evaluated more closely.
– Use decision trees:
• income < $40K
» job > 5 yrs then good risk
» job < 5 yrs then bad risk
• income > $40K
» high debt then bad risk
» low debt then good risk
– Or Rule Sets:
• Rule #1 for good risk:
» if income > $40K
» if low debt
• Rule #2 for good risk:
» if income < $40K
» if job > 5 years
Group behavior using a clustering
algorithm
Identify outliers and investigate
Build a profile of the characteristics of
fraudulent behavior.
Pull out the cases that meet the
characteristics of fraud.
33(*Source: IBM)
34. MORE ON CAPITAL DEPLOYMENT: TEXT
MINING (NATURAL LANGUAGE
PROCESSING) EXAMPLE
Copyright Red Olive 201512/10/2015 34
35. Overview of text mining
Why is text mining of interest?
Example: Imagine you are a large telecoms company with
hundreds of customer service agents and you want to classify
all inbound customer communication quickly and direct it to
the right people to deal with it best.
12/10/2015 35Copyright Red Olive 2015
Text data
sources
Text
enrichment
Subject
matching
Sentiment
classification
Information
delivery
37. Text enrichment
12/10/2015 37Copyright Red Olive 2015
Text data
sources
Text
enrichment
Subject
matching
Sentiment
classification
Information
delivery
Why not sort your signal issues out instead of
bringing new phones out!!!! Wk 3 of crap signal
but yet paying FULL monthly contract! Vodafone
sort it.
Why not sort your signal issues out instead of
bringing new phones out!!!! Wk 3 of crap [----]
signal but yet paying FULL monthly contract!
Vodafone sort it.
Original Facebook Message Sentiment Amplifier
Why[WRB] not[RB] sort[VBG] your[PRP]
signal[VBP] issues [VBZ] out[IN] instead[RB]
of[IN] bringing[VBG] new[JJ]
phones[NNS]!!!![SYM] Wk[NNP] 3[CD] of[IN]
crap[NN] but[CC] yet[RB] paying[VBG]
FULL[NNP] monthly[RB] contract[NN] ![SYM]
Vodafone[NNP] sort[VBG] it[PRP] .[SYM]
Penn Treebank P.O.S. Tagger (English Messages)
sort[VBG] signal[VBP] issues [VBZ] instead[RB]
bringing[VBG] phones[NNS] Wk[NNP] 3[CD]
crap[NN] paying[VBG] monthly[RB] contract[NN]
Vodafone[NNP]
Removal of stop words and punctuation
38. Subject matching
12/10/2015 38Copyright Red Olive 2015
Text data
sources
Text
enrichment
Subject
matching
Sentiment
classification
Information
delivery
Why not sort your signal issues out instead of
bringing new phones out!!!! Wk 3 of crap signal
but yet paying FULL monthly contract! Vodafone
sort it.
Original Facebook Message
Subject Matching (Fuzzy Matching)
Why not sort your signal issues out instead of
bringing new phones out!!!! Wk 3 of crap signal
[NETWORK]but yet paying FULL monthly
contract! Vodafone sort it. [COMPLAINT]
BUSINESS TRANSACTION: Complaint
NETWORK: No Signal
PRODUCT: Samsung Galaxy S4
39. Sentiment classification
Many further factors help determine sentiment: Emoticons,
“Likes” on social media channels, …
Further text classification using e.g. Decision Trees.
Result: a sentiment classification.
12/10/2015 39Copyright Red Olive 2015
Text data
sources
Text
enrichment
Subject
matching
Sentiment
classification
Information
delivery
40. TEXT MINING – POLITICS
Copyright Red Olive 201512/10/2015 40
41. Analysis undertaken so far
Two samples of data from Hansard (the transcriptions of
proceedings in the Houses of Parliament) have been
downloaded, relating to:
Nicholas Soames, Conservative MP and former Defence Secretary.
Dennis Skinner, longstanding Labour MP.
The various files were loaded into SPSS Modeler’s text mining
platform. The data was parsed using Natural Language
Processing (NLP) to identify prominent “concepts” and then
some basic analysis of these concepts was carried out.
12/10/2015 41Copyright Red Olive 2015
42. Findings: Nicholas Soames’ concepts
The most commonly repeating concepts identified are listed
below with “country” the most frequent, occurring 72 times.
“Immigration” occurred 40 times and was expanded further.
12/10/2015 42Copyright Red Olive 2015
43. Findings: Nicholas Soames,
immigration
A concept map was created centred on “immigration”. This
shows the strength of association between two concepts. In
the case of “immigration”, the strongest concept associations
are with “defence”, “society” and “social”.
12/10/2015 43Copyright Red Olive 2015
44. Findings: Dennis Skinner,
immigration
In stark contrast, Dennis Skinner says virtually nothing on the
issue of immigration.
12/10/2015 44Copyright Red Olive 2015
45. Findings: Dennis Skinner’s concepts
One of the top concepts in Dennis Skinner’s comments is
“pits”, occurring 54 times.
12/10/2015 45Copyright Red Olive 2015
46. Findings: Dennis Skinner, pits
Below is a concept map centred on “pits”. The strongest
associations are with “tories”, “help” and so on.
12/10/2015 46Copyright Red Olive 2015
47. Findings: Nicholas Soames concept
categories
In the “military” context, there seem to be particularly strong
links between the categories “human resources”, “finance”
and “geographical location”…
12/10/2015 47Copyright Red Olive 2015
48. Findings: Nicholas Soames concept
categories
… so if we go back to relevant original texts, linked below, we
may expect to find the cost of having people in certain
locations as a prominent theme.
12/10/2015 48Copyright Red Olive 2015
49. Findings: Dennis Skinner concept
categories
A similar analysis of Dennis Skinner’s concept categories
based on “natural resources”.
12/10/2015 49Copyright Red Olive 2015
50. Learn more…
Has this morning whet your appetite? We’d
love to talk with you further about analytics
for your own organisation. To arrange to do
that please leave your contact details on one
of the sheets near the door or just have a
word with Jefferson, John or Mark.
12/10/2015 50Copyright Red Olive 2015
51. Preparing to try it out for real?
Ready to try this out for real? We can help you
build your business case and prove the benefits
to your business on your data. Please have a chat
with us at the end.
If you’re already further along, we run more in-
depth training courses:
Solving business problems using data analytics.
Statistical thinking.
Data mining principles and techniques.
Hands-on skills in data mining and predictive
analytics.
12/10/2015 51Copyright Red Olive 2015
52. Quick recap
What we’ve covered:
Business context, modelling process, addressing the
right problem(s).
Customer intimacy: new business offerings (internet
loans), skills you’ll need; customer development and
retention (Royal Mail).
Predictive asset management: customer satisfaction
and internal processes; flood prediction.
Fraud and anomalies: process for detection.
Text mining: telecoms complaints, political analysis.
12/10/2015 52Copyright Red Olive 2015