Can 'BlackBox' responsible gambling algorithms be understood by users?
Christian Percy, BetBuddy
Presented at the New Horizons in Responsible Gambling Conference in Vancouver, February 1-3, 2016
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Can 'BlackBox' responsible gambling algorithms be understood by users?
1.
2.
3. Can ‘Black Box’ Responsible
Gambling Algorithms be
Understood by Users?
A real-world example
Chris Percy, New Horizons in Responsible Gambling Conference,
Vancouver, February 2016
4. Key Concepts
• Machine Learning: the study and construction of complex “black
box” algorithms that are capable of learning from data and
making predictions based on it
• Supervised Machine Learning: the user provides the algorithm
with test data with labelled outcomes (e.g. Person A is high risk,
Person B is low risk, …)
• Knowledge Extraction: despite being complex, machine learning
algorithms can be understood (and simplified) – they are not
magic
5. Would you rather have:
1. An algorithm that assesses problematic play that is 90%
accurate which you cannot properly understand or explain
or
2. An algorithm that assesses problematic play that is 75%
accurate which you fully understand or explain?
Question
6. - Overview of problem gambling literature
- IT platform
- Statistical researchers
- Multi-site, multi-country gambling platform
- Anonymised data
- 240,000 accounts available (mainly EU)
• Study 1: Analyse casino internet
players using k-means clustering
• Study 2: Describe internet
casino/poker self-excluders
• Study 3: Predicting self-exclusion
events
• Study 4: Understanding self-
exclusion predictive methods
Study 1: S. Dragicevic, G. Tsogas, A. Kudic, International Gambling Studies, Nov 2011. Study 2: S.Dragicevic, C.Percy, A.Kudic & J.Parke, J. of Gambling Studies, Nov
2013. Study 3: C. Percy, M. Franca, S. Dragicevic, A. d’Avila Garcez, International Gambling Studies [under review]. Study 4: In progress
Research Collaboration
7. Focusoftoday
- Overview of problem gambling literature
- IT platform
- Statistical researchers
- Multi-site, multi-country gambling platform
- Anonymised data
- 240,000 accounts available (mainly EU)
• Study 1: Analyse casino internet
players using k-means clustering
• Study 2: Describe internet
casino/poker self-excluders
• Study 3: Predicting self-exclusion
events
• Study 4: Understanding self-
exclusion predictive methods
Study 1: S. Dragicevic, G. Tsogas, A. Kudic, International Gambling Studies, Nov 2011. Study 2: S.Dragicevic, C.Percy, A.Kudic & J.Parke, J. of Gambling Studies, Nov
2013. Study 3: C. Percy, M. Franca, S. Dragicevic, A. d’Avila Garcez, International Gambling Studies [under review]. Study 4: In progress
Research Collaboration
8. Chris Percy
Lead Researcher
Simo Dragicevic
CEO
Manoel Franca
Research Assistant
Artur d’Avila Garcez
Reader
Tillman Weyde
Senior Lecturer
Greg Slabaugh
Senior Lecturer
Machine Learning Group
Department of Computing
Science
Study Team
11. Question for
today
Mid-point view…
“Predicting those who self-excluded from online gambling vs those
that did not” (N=845)
• Can different machine learning approaches improve on predictive
accuracy? Which are best?
• Are such methods still interpretable and useable in practice?
What can be done quickly and easily to help interpret the models?
• Accuracy can be improved 10-20 % points (random forest), but
models are very reliant on human input
• There is an accuracy / interpretation trade-off at first glance, but
additional techniques can help understand what drives the model
• To explain an individual’s personal results, we may need further
layers of interpretative software
Session Overview
13. Raw Data
Pre-Processing
Machine Learning
Accuracy Results
Interpretation
176 who self-excluded for 180+ days & 669 as a control group
Only partial data coverage
5 gambling behaviour risk factors
(frequency, trajectory, intensity, variability, session time)
4 supervised learning techniques
(logistic reg., Bayesian nets, random forest, neural nets)
72%-87% vs baseline accuracy of 52%
Best results from random forest
Large multi-variate models provide little actionable insight
beyond raw prediction – simplification needed (e.g. TREPAN)
Analysis & Results
14. Raw Data
Pre-Processing
Machine Learning
Accuracy Results
Interpretation
176 who self-excluded for 180+ days & 669 as a control group
Only partial data coverage
5 gambling behaviour risk factors
(frequency, trajectory, intensity, variability, session time)
4 supervised learning techniques
(logistic reg., Bayesian nets, random forest, neural nets)
72%-87% vs baseline accuracy of 52%
Best results from random forest
Large multi-variate models provide little actionable insight
beyond raw prediction – simplification needed (e.g. TREPAN)
Analysis & Results
15. Demographics
• Gender
• Age
• Country
Gambling behaviour
• Start/end of each session
• Amount bet per wager
• Amount won per wager (not used)
• Type of game (not used)
• Purchases/withdrawals transactions (not used)
Self-exclusion
• Start date of self-exclusion
• Length (varies from 1 day to 1+ years, to unlimited)
Our Data
16. Self-Excluder Cohort (SE)
• 604 who had self-excluded at least once
from April 2009 to July 2011
• Data on sessions leading up to their first
self-exclusion from June 2008+
Control Group Cohort (CG)
• 871 representative of 11,667 who
gambled 10+ sessions in Jan 2009
(as did ~95% of SE cohort)
• Had not self-excluded as of July 2011
• Data available from Jan 2009 to Dec 2010
Exclusions
• First SE period < 180 days
• Insufficient data to calculate risk factors
(~50% only played one week)
Exclusions
• Festive period: Nov/Dec 2010
• Insufficient data to calculate risk factors
Sample generation – De-identified data from IGT
Final cohort size: 176 Final cohort size: 669
17. -1000
0
1000
2000
3000
4000
5000
5th 25th Median 75th 95th
Self-excluders Control group
Avg loss / month [EUR]1)
SE Mean: 897 EUR
CG Mean: 646 EUR
P-value 0.00
Those who self-
excluded lost 250
EUR extra per
month on average
Lose more / win more
Riskier, higher wager bets?
1) Excl. one lucky self-
excluding player who won
EUR 24 m in one bet,
enough to skew overall
averages
Note: Descriptive charts are
taken from Study 2 and were
designed to optimise the
sample size available; hence
different n in some cases vs
today’s main results
Self-excluders were:
• Even more focused on casino game
types (vs poker) than control group
• Less loyal to their top games
• Tried fewer games (shorter tenure)
• Younger
Little difference seen in:
• Absolute time gambling per month
• Number of wager-sessions per
active gambling day
• Gender
Descriptive comparison
18. Pre-Processing
Machine Learning
Accuracy Results
Interpretation
176 who self-excluded for 180+ days & 669 as a control group
Only partial data coverage
5 gambling behaviour risk factors
(frequency, trajectory, intensity, variability, session time)
4 supervised learning techniques
(logistic reg., Bayesian nets, random forest, neural nets)
72%-87% vs baseline accuracy of 52%
Best results from random forest
Large multi-variate models provide little actionable insight
beyond raw prediction – simplification needed (e.g. TREPAN)
Analysis & Results
Raw Data
19. • Direct input of de-contextualised raw data resulted in ineffective models
• ML methods (as used here) required human-defined pre-processing activities
• Alternative: Construct ML with general principles for data linkage so it can do “its own” pre-
processing (e.g. image recognition ML)
Risk Factor Brief description
Frequency What proportion of days an individual gambles (vs all days in period)
Trajectory How much an individual wagers in total per day (on days they gamble)
Intensity How many times a gambler places wagers per day (on days they gamble)
Variability How much total bet amount varies from day to day (on days they gamble)
Session Time How long a gambler’s sessions last all together (on days they gamble)
Rawdata
&ML
Pre-processing – Including risk factors
20. Risk Factor
Grid Frequency
Trajectory
Variability
Session
Time
Intensity
Past period
(Absolute value)
Current period
(Absolute value)
Delta > 10%?
(Delta vs past)
Delta
(Current vs past)
P-Value (1-PV)
(Delta vs past)
P-Value
category
(H/M/L)
3 demographics variables
• Gender (Male dummy)
• Age in 2010
• Country (DE dummy)
Remove over-identifying data not
available in live operating scenario
• Total days gambling
• Total bet
• Calendar dates
Focus on large number of variables to
optimise for accuracy
(no certainty over which will work)
30 variables available to
capture gambling behaviour in
terms of:
• Absolute level of activity
(past and present)
• Delta in current activity vs
the past
• Statistical significance of
any changes in trend
Pre-processing – 33 variables used in analysis
21. Machine Learning
Pre-Processing
Accuracy Results
Interpretation
176 who self-excluded for 180+ days & 669 as a control group
Only partial data coverage
5 gambling behaviour risk factors
(frequency, trajectory, intensity, variability, session time)
4 supervised learning techniques
(logistic reg., Bayesian nets, random forest, neural nets)
72%-87% vs baseline accuracy of 52%
Best results from random forest
Large multi-variate models provide little actionable insight
beyond raw prediction – simplification needed (e.g. TREPAN)
Raw Data
Pre-Processing
Analysis & Results
22. Training dataset General template ML process Model output
“Outcome”
“Input data”
“Algorithm”
“Define Good”
“Iterate”
“Model”
Supervised machine learning – basic principles
23. Training dataset General template ML process Model output
“Outcome”
We know the correct
classification for each one
e.g. Yes, he self-excluded
or No, he did not
“Input data”
For each person, we also
have various descriptive
facts to use to estimate the
outcome
“Algorithm”
• Each ML technique has its
own general template for
manipulating input data to
create a score for each
player – A cut-off value
will determine what
scores are classified as SE
• But there are many
different ways to apply
the template – some
which the programmer
chooses, others which are
chosen by the computer
“Define Good”
Tell computer if we
want a harsh model,
e.g. only identifies self-
excluder where very
confident, or one that
optimises accuracy
“Iterate”
• Computer tries lots
(and lots!) of ways to
set up and optimise
the template
• Not just “trial and
error” but directed
approach
“Model”
• Fully-determined
specific set of rules to
turn input data into a
single score
• Rule to turn each
score into an outcome
classification
• We know how
accurate the model is
on training / test
datasets
• Some “rules” end up
long and convoluted
“black box”
• For social science models, we know reality is more complex than our template
• We also know we’re missing important data, e.g. have they just lost their job / are they rich?
• But a particular model may still be a good enough approximation to be useful
Ambition
PerfectionSupervised machine learning – basic principles
24. Typicallylargermodels1)
Typicallyhardertointerpret
Technique Template summary WEKA parameters (e.g.)
Logistic
regression
[simple]
• Relates each input variable directly to the
output variable via logistic curve
• No mapping of inter-input variable
relationships or non-linear transformations
(this is possible in advanced logistic reg.)
• Link function: Binomial log.
• Classification cut-off : 0.5
Bayesian
network
• Structured map of the main ways that all
input variables and output variable relate
to each other
• Based on their conditional probabilities.
• Parents: Unlimited
• Score type: Entropy
• Type: SimpleEstimator
• Algorithm: K2
• No prior knowledge assumed
Neural network • Layers of connected nodes with activation
functions (typically non-linear, e.g.
sigmoid)
• First layer: values of all the input variables
• Middle layer: non-linear transformations
• Output layer: a prediction score
• Momentum: 0.1
• Learning rate: 0.05
• Decay factor: 0.999
• Learning rule: Backpropagation
Random forest • Ensemble of many decision trees
• Each tree seeks to classify a gambler into
self-excluder or not based on the values of
a subset of input variables
• Max depth: Unlimited
• Number of trees: 200
• Features used for rand. selc.: 3
1) i.e. the model typically
employs a greater number of
links from inputs to outputs /
more lines of code are
required to present the
model
Machine learning – Four techniques
25. Accuracy Results
72%-87% vs baseline accuracy of 52%
Best results from random forest
Interpretation
176 who self-excluded for 180+ days & 669 as a control group
Only partial data coverage
5 gambling behaviour risk factors
(frequency, trajectory, intensity, variability, session time)
4 supervised learning techniques
(logistic reg., Bayesian nets, random forest, neural nets)
Large multi-variate models provide little actionable insight
beyond raw prediction – simplification needed (e.g. TREPAN)
Pre-Processing
Analysis & Results
Machine Learning
Raw Data
26. • SMOTE to balance the
dataset to ~50:50
control group and
self-excluders
• 10-fold cross-
validation to reduce
over-fitting
• Default WEKA
parameters used
where not specified
Sensitivity
Correctly
classified
self-
excluders
Specificity
Correctly
classified
control
group
Logistic
regression 0.70 0.74
Bayesian
networks 0.77 0.94
Neural
networks 0.73 0.80
Random
forest 0.87 0.87
87% 86%
77% 72%
0%
20%
40%
60%
80%
100%
Overall Accuracy
[st. dev.]
2.3 3.5 3.4 2.7
• Random forest and Bayesian networks perform best on overall accuracy ~87%
(baseline accuracy of “pick the most common outcome” would be 52%)
• Higher reliability of random forest and sensitivity/specificity balance favours Random forest
• Simple RF /demographic judgement rules by eye identify similar accuracy to logistic regression
Comment
Accuracy results
27. Interpretation
Large multi-variate models provide little actionable insight
beyond raw prediction – simplification needed (e.g. TREPAN)
176 who self-excluded for 180+ days & 669 as a control group
Only partial data coverage
5 gambling behaviour risk factors
(frequency, trajectory, intensity, variability, session time)
4 supervised learning techniques
(logistic reg., Bayesian nets, random forest, neural nets)
Pre-Processing
Analysis & Results
Machine Learning
Raw Data
Accuracy Results
72%-87% vs baseline accuracy of 52%
Best results from random forest
28. When does
interpretation
matter?
When is
interpretation
less important?
• Focused just on getting an accurate prediction
• No value in knowing why a prediction is accurate
• Enough to have an estimate of how accurate it is likely to be
• Understanding what drives the model means we can:
• …challenge it1) and know its strengths / flaws
• …take industry-level action based on model insights
(e.g. identify if some casino games are very high risk)
• …try to simplify the model and get similar accuracy
(e.g. might matter for real-time / fast computing time)
1) Models can be wrong, e.g. poor set-up, over-fitting, quirks of the sample – challenging it from experience gives confidence in it being robust in the future
• Understanding why a gambler gets a specific result helps:
• …explain the result to someone (e.g. might help accept it and take action)
• …check reasons model may be wrong in this particular case
• …understand what to change to get a different prediction
Are assessments and decisions open to regulatory, legal or clinical challenge?
Interpretation – What are we seeking to achieve?
29. Random Forest – Raw output (1/2)
Feature #
Number of Trees
(x2 for binary output variant)
200
Model Size in KB 841
Minimum Tree Height 13
Maximum Tree Height 23
Mean Tree Height 17
Minimum Number of Leaves 146
Maximum Number of Leaves 203
Mean Number of Leaves 176
Hypothetical decision tree [segment]Model schematic
Age > 31
YesNo
….
Frequency >
30%
YesNo
Self-ExcluderGender is Male
YesNo
Self-ExcluderControl Group
YesNo
Shown segment:
Height: 3
Leaves: 3
Very small compared
to real output
30. The model can be exported as Java script: One page shown in Word below
Why hard to analyse in current form?
• Each tree is a complex set of unequal
routes and variable interactions
• Just too large… 200 trees
Random Forest – Raw output (2/2)
31. What is this?
• A count of the number of times that each
variable occurs across the 2,536 pages of
code
• Scores are normalised relative to the most
frequent variable count (Age in this case)
…
Var - Increase
Int - Stat Sig
Sess - Stat Sig
Sess - Increase
Int - Previous Period
Traj - Increase
Freq - Increase
Traj - Increase
Sess - Increase
Freq - Stat Sig
Int - Current Period
Sess - Current Period
Freq - Previous Period
Freq - Current Period
Var - Previous Period
Traj - Current period
Var - Current Period
Age
Top variables – By scaled frequency
Why is it limited? No account of:
• Where on a tree the variable appears
• How its influence depends on other
variables
• What value ranges drive results
(i.e. positive or negative net influence?)
• How many gamblers are distinguished at
each node (minor in well-trained model)
Random Forest – Frequencies
32. Input Layer =
33 variables
Trajectory
Risk
Age …Gender
Hidden Layer =
All 33 variables feed
into each of 17
nodes
Node 1 Node 2 …Node 3
Output Layer =
All 17 nodes feed
into 2 output nodes
Control
Group Score
Self-
Excluder
Score
Simple rule:
Allocate the
player to CG or
SE depending
which has the
highest score
How to link inputs to outputs: In this model, all nodes are sigmoid nodes. As well as a threshold score, each node has a weight (i.e. a number) for each input variable.
We multiply the value of each input variable by the relevant weight; do the same for all input variables and add them together, then minus the threshold score. The
result is then transformed by the sigmoid function to give the node output: f(x) = (1 + e-x)-1. Node outputs are between 0 and 1.
Neural Networks – Raw output
33. Input Layer =
33 variables
Trajectory
Risk
Age …Gender
Hidden Layer =
All 33 variables feed
into each of 17
nodes
Node 1 Node 2 …Node 3
Output Layer =
All 17 nodes feed
into 2 output nodes
Control
Group Score
Self-
Excluder
Score
Simple rule:
Allocate the
player to CG or
SE depending
which has the
highest score
Neural Networks – Raw output
Why hard to analyse in current form?
• We can capture the model in a few hundred Excel cells,
but interactions between variables make it hard to
analyse directly
• Each input affects the model via 17 x 2 different routes
• Not all inputs are equally important, in that their typical
range of values might or might not often change the
prediction
• Whether its value is important on any given route
depends on the value of the other inputs
• If an input is important on one route but not another, is it
important overall?
34. Are at least 3 of these true:
• Age > 31
• Frequency trend not significant
• Variability high
• Intensity has increased 49% vs previous period
Has intensity increased 22% vs
the previous period?
Do they score medium or high
on Variability stat. significance?
Is increase in Frequency highly
stat. significant (10%) level?
Are they based in Germany?
Do they score zero on Session
Time statistical significance?
Are they male?
Score low, med or high on
Frequency stat. significance?
No
Yes
SE
NO
NO
NO
SE
SE
SE
NO
NO
What is TREPAN?
• Algorithm that treats the neural network as an
oracle – try lots of inputs to see which ones matter
• Seeks a trade-off between fidelity to the original
neural network, accuracy and simplicity
Results
• Fidelity: 87%
• Loss of accuracy: 1-2 percentage points
(NB. Neural network had relatively poor accuracy)
Neural Networks – TREPAN (1/2)
35. Neural Networks – TREPAN (2/2)
Are at least 3 of these true:
• Age > 31
• Frequency trend not significant
• Variability high
• Intensity has increased 49% vs previous period
Has intensity increased 22% vs
the previous period?
Do they score medium or high
on Variability stat. significance?
Is increase in Frequency highly
stat. significant (10%) level?
Are they based in Germany?
Do they score zero on Session
Time statistical significance?
Are they male?
Score low, med or high on
Frequency stat. significance?
No
Yes
SE
NO
NO
NO
SE
SE
SE
NO
NO
Minor: Only ~3% of sample goes down these routes
Strong flag on Frequency = More likely to self-exclude
These self-excluders flagged highly on
Variability (and possibly on Intensity)
All flagged at least a little on Session Time –
German players less likely to self-exclude
overall (interpretation uncertain)
If you flag a little on Frequency,
but do not flag on at least one
of Intensity or Variability, you
are likely to be control group
36. Variable (1/2) OR P-value
Intercept 0.1 0.0
Trajectory risk factor
Statistical significance (1-P-value) 2.9 0.1
Slope coefficient 1.0 0.0
Average amount bet per gambling day 1.0 0.0
Increase in average bet per day 0.9 0.5
Dummy variable for increase >10% (1=Yes,0=No) 0.6 0.1
Statistical significance category 1.5 0.1
Frequency risk factor
Statistical significance (1-P-value) 2.5 0.0
Current Frequency (share of gambling days in
the current period)
0.8 0.6
Prior Frequency (share of gambling days the
previous period)
0.7 0.6
Increase in Frequency between periods 0.9 0.0
Dummy variable for increase >10% (1=Yes,0=No) 2.3 0.0
Statistical significance category 0.8 0.0
Intensity risk factor
Statistical significance (1-P-value) 0.1 0.0
Current Intensity (number of bets placed over
the recent period)
1.0 0.0
Prior Intensity (number of bets placed over the
previous period)
1.0 0.5
Increase in Intensity between periods 1.1 0.1
Dummy variable for increase >10% (1=Yes,0=No) 12.7 0.0
Statistical significance category 1.4 0.1
Variable (2/2) OR P-value
Session time risk factor
Statistical significance (1-P-value) 0.2 0.0
Slope coefficient 1.0 0.1
Average session time per gambling day 0.8 0.0
Increase in average session time 1.2 0.4
Dummy variable for increase >10% (1=Yes,0=No) 1.7 0.1
Statistical significance category 1.3 0.2
Variability risk factor
Statistical significance (1-P-value) 1.1 0.9
Amount bet standard deviation (recent period) 1.0 0.0
Amount bet standard deviation (previous
period)
1.0 0.0
Increase in standard deviation 1.0 0.8
Dummy variable for increase >10%
(1=Yes,0=No)
1.4 0.3
Statistical significance category 0.9 0.4
Demographic variables
Gender (1=Male, 0=Female) 2.1 0.0
Age in 2010 1.1 0.0
Germany-based dummy variable
(1=Germany, 0=Non-German)
1.7 0.0
Variables rounded to one decimal place for presentation purposes only
Logistic regression – Full model
37. Variable (excerpt)
Odds
ratio
P-
value
Intercept 0.1 0.0
Trajectory risk factor
Statistical significance (1-P-value) 2.9 0.1
Slope coefficient 1.0 0.0
Average amount bet per gambling
day
1.0 0.0
Increase in average bet per day 0.9 0.5
Dummy variable for increase >10%
(1=Yes,0=No)
0.6 0.1
Statistical significance category 1.5 0.1
…
• How much to multiple a gambler’s probability
of being in the control group for each unit
they score (holding other variables constant)
• e.g. Bet amount increased by 10%
40% less likely to be control group
• i.e. how powerful is the predictor
• Multiplying by 1.0 doesn’t change
anything!
• But the slope coefficient can be a
large number and varies widely
• Each unit multiplied by 1.001 can
be a big effect over thousands of
units
• P-value range is from 0 to 1 and a low value
makes us more confident the predictive
value of that variable is greater than zero1)
• If two variables have the same Odds Ratio,
a lower p-value means the predictive effect
is more consistent across different players
• i.e. how reliable is the predictor
1) More specifically: It states
the probability that the co-
efficient might in fact be zero
(i.e. the odds ratio be one)
given the volatility and
patterns in the data
Logistic regression – Excerpt (1/2)
38. Variable (excerpt)
Odds
ratio
P-
value
Intercept 0.1 0.0
Trajectory risk factor
Statistical significance (1-P-value) 2.9 0.1
Slope coefficient 1.0 0.0
Average amount bet per gambling
day
1.0 0.0
Increase in average bet per day 0.9 0.5
Dummy variable for increase >10%
(1=Yes,0=No)
0.6 0.1
Statistical significance category 1.5 0.1
…
Why is interpretation limited?
• To have a simple structure, we did not include
interactions between variables, but what if
combinations of behaviours matter?
(e.g. betting less per day might mitigate
spending more days betting)
• To optimise accuracy, we included a lot of
variables, but some are correlated, which
reduces ability to interpret individual ORs or
PVs
(e.g. Statistical significance and Statistical
significance category within each risk factor)
• When an individual risk factor contains some
“strong predictors” and some “weak
predictors” with some pointing in opposite
directions, what does that mean overall?
In theory this looks like we can
interpret the importance of each
variable quite well…
Logistic regression – Excerpt (2/2)
39. Problem with random forest and neural nets was primarily the scale and number of routes
For regression, it is the large number of co-varying variables (affects other techniques too)
1. Create simpler models with fewer variables from start1)
e.g. Include “Delta value” and exclude “Delta > 10%”
e.g. Do each risk factor one by one
BUT:
No longer the same model
Accuracy suffers (at least a bit)
0
0.2
0.4
Demog Int Sess Traj Freq Var
Propn of variables with p-value < 0.15
100% 83% 67% 83% 67% 33%
1) Various methods exist, e.g. remove those with lowest predictive power, combine or reduce variables a priori that are conceptually related, PCA, etc.
2. Average e.g. the p-values across common topics of interest
Exploring either
option properly
quickly becomes
a major task to
be done manually
for each model ,
with lots of
different
approaches to
choose from
Logistic regression – Variable Importance
BUT:
Averages might disguise a “killer
variable”
Still no insight into net effect size
or direction2)
2) Requires more work but we can do a similar graph for this too
How to respond? Two ideas within LR framework
40. WEKA graph :
Priority Layout, Top
Down, With Edge
Concentration
One node for each of the 33 networks,
plus the top node to represent the Self
Excluder / Control Group outcome
• Can pay less attention to mid 2 columns:
Reflect just 4% of sample, others are 45%+
• Ratios of 4:5 and 5:2 Infer higher risk
stat sig. = more likely to be self-excluder
• Combine these conditional probabilities
across network to get overall probability
0.00 0.00 –
0.21
0.21 –
0.54
0.54 –
1.00
Control 0.5 0.0 0.1 0.4
Self-Exclude 0.2 0.1 0.2 0.5
Probability distribution of Frequency Statistical Significance
Simple node: Frequency Stat. Significance
• Links directly to final output
• Only takes values 0 1
• Probability of being a self-excluder,
conditional just on med/high score is 0.5
Bayesian networks (1/2)
41. Bayesian networks (2/2)
Why hard to analyse in current form?
• Too many non-linear conditional probabilities – Not practical to do much by eye
• Further computation or techniques would be required to analyse this network usefully
Complex low-level node: Variability delta > 10%
• Simple variable: only takes value “Yes” or “No”
• But its impact on the conditional probability of being a self-
excluder depends on fairly precise values of two Trajectory
variables and two other Variability variables
• Overall it can take 360 separately defined conditional
probabilities
WEKA graph :
Priority Layout, Top
Down, With Edge
Concentration
42. Interpretation – Overall view on 33-variable model
Technique Highlighted variables
Logistic
regression
[simple]
Bayesian
network
Neural
network
Random
forest
Immediate output
• Simple picture (one page)
• Each variable can be
superficially ranked – but not
that insightful on its own due
to covariance
• Theoretically accessible (a
few pages)
• But non-linear conditional
probabilities too complex
• Mathematically clear and
accessible (dozen pages)
• But interconnections too
complex to analyse by eye
• Impenetrable
• Thousands of pages of code
Quick analysis
• Looking at sets of p-
values gives a sense
of good predictors
but not effect size or
direction
• Visual graph analysis
• TREPAN creates an
interpretable map
• Some loss of
accuracy
• Frequency chart tells
what variables used
• But not how they are
used or when
• Demographics
• Absolute values and statistical
significance more than trends
• Variability least useful risk factor
• Demographics
• Most risk factor features play a
role, with least emphasis on the
statistical significance of
Variability
• Demographics
• Statistical significance of Session
Time, Frequency and Variability
• Absolute increases in Intensity
and Variability
• Age
• Absolute values (all risk factors)
• Frequency statistical significance
44. Human oversight of model creation is essential - Transparency / flexibility needed
1) Supervised models rely on the choice of a well-defined outcome parameter, but
problem gambling manifests in different ways; not always well-defined
2) Raw session data work poorly as inputs Human role in teaching “how to read” or
pre-processing, as well as in model choice, parameters, control group etc.
“Machine learning” still not “Artificial intelligence” in this instance
Mid-point research: A few learnings so far (1/3)
45. There is a trade-off between accuracy and direct model-level interpretability….
• Interpretability reduced by more complex models (more “routes” from inputs to outputs)
• And by models with more variables (especially with correlation between variables)
Improving interpretability
(e.g. which variables matter most for model prediction)
90%
60%
Simple linear models /
Visual descriptive analysis
Random Forest
Accuracy
(onbalanceddataset)
… but applying additional techniques show some promise
• TREPAN reduced 595 Neural Network model parameters to a simple decision tree with 8 nodes
– accuracy only declines by a few % points (other techniques can be used too)
Mid-point research: A few learnings so far (2/3)
46. Where it is important to explain an individual’s results properly, we probably need different
layers of interpretative software
• Model-level interpretation enables us to say which risk factors / features matter in general –
useful for industry analysis and giving general overview to people
• But this does not explain why a specific prediction is obtained, as the different inputs of an
individual will mean different routes through random forest or different parameters being
important in a neural network
• So you can’t tell someone specifically why they flagged a risk or what specifically they would
need to change to alter it
• In principle, interpretative software can be created, but we’re not aware of it already
• Random forests frequency chart for an individual route
• Software that tests slight variations on an individual’s inputs to see which ones change
the prediction most; highlighted/colour-coded decision trees etc.
• Supplement with descriptive analytics / trends to enable human judgement
Mid-point research: A few learnings so far (3/3)
47. 1) Even where ML predictions are effective, they rely on a well-defined and visible outcome for the
training dataset, but real-world problem gambling manifests in many different outcomes
2) This paper relies on self-exclusion as a well-defined outcome, but this only captures a small
group of those potentially with issues, as well as those who also then took action
• Only a small set self-exclude (<0.3%) and many do so very early
• Some self-exclude for non-problem reasons (punish operator, test system, …) - Hayer and
Meyer (2011a,b) found 76% self-excluded for PG reasons
3) Incomplete and imperfect data
• Missing data we might wish to have (e.g. financial context, credit scores)
• Small sample relative to number of variables and complexity of outcome
• Imperfect control group
• Incomplete picture of a person’s overall gambling activity (e.g. other sites, venues)
• Data from a single mostly-European platform
Key limitations
49. • Extract more knowledge from these models with additional techniques
• With random forest, explore “information gain by variable” analysis
• Apply TREPAN to other ML models / create similar oracle methods
• Group variables based on domain knowledge – or start with simpler models and build up
• Use larger and richer samples (e.g. OLG) to:
• Explore different risk parameters, e.g. trend changes over shorter time periods
• Create additional risk factors, e.g. loss chasing, time of day gambling, type of game
• Test different combinations of risk factors (e.g. highest score across a set of risk factors)
• Model against different outcome variables, e.g. PGSI
• Explore application issues with industry
• To what extent the interpretability issue matters
• What kind of interpretability to prioritise
• How best to use models to reduce the risk of harm - How early can risks be identified
Ideas for the future
50. Having seen the presentation, would you rather have:
1. An algorithm that assesses problematic play that is 90%
accurate which you cannot properly understand or explain
or
2. An algorithm that assesses problematic play that is 75%
accurate which you fully understand or explain?
Question (again)
51. The Responsible Gaming
Analytics Experts
Contact Us:
Christian.Percy@bet-buddy.com
Web
www.bet-buddy.com
Twitter
@Bet_Buddy
@Bet_Buddy/team
Q&A and Contact Us (more info + case study)
52. 1. Open New Horizons app
2. Select the Agenda button
3. Select This Session
4. Select Take Survey at the bottom
To provide session feedback:
If you are unable to download the app,
please raise your hand for a paper version.
If you are unable to download the app, please
see one of our conference hosts located just
outside the room.