1. Claim Pattern Anomalies
Making a Mole Hill Out of a Mountain
Predictive Analytics World for Business
San Francisco
May 17, 2017
CAS Analytics & Data Provisioning Team v01
Darryl Humphrey, PhD, PMP
linkedin.com/in/dghumphrey1
2. Provider and member claiming behavior is
affected by many factors.
2CAS ADP Team
FraudAnalytics
Member and
Provider
Claiming
Patterns are
Dynamic
Economic Conditions
Plan Design
Policies and Processes
Compliance Verification
Industry Realities
3. Analyzing equivalent of 87,000,000 claim lines
monthly encompassing 17,000 providers and
1.6 million members.
–Nine (9) practice
areas across health,
dental, and pharmacy
benefits
–70 measures of
claiming behavior
–Six (6) algorithms
–Look for converging
results
3CAS ADP Team
4. Multi-variate distance measure identifies providers
whose claiming patterns differ from the population.
ProportionofTotal$
AssociatedwithRiskyClaims
0
.2.4.6.8
1
0 50 100 150
DrugRD
Non Outlier MCD Outlier
November Analytic Run
All providers reviewed
4.18
4CAS ADP Team
5. 0
.2.4.6.8
1
0 50 100 150
DrugRD
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Cluster 5 Cluster 6 Cluster 7 Cluster 8
November Analytic Run
All Providers reviewed kmeans results
Clustering algorithm sharpens the focus on the
riskiest providers.
4.18
Providers that cluster together have similar claiming patterns.
24
5
54
n=34
Small clusters
with high RD
scores are of most
interest.
ProportionofTotal$
thatareatRisk
5CAS ADP Team
7. CAS ADP Team 7
Claim-specific risk is estimated for the variables highlighted in
the K-means and MCD analyses.
RiskMA(i,j) = (e-(MA(i,j)/Max
MA
(i)) * (1-di(j)/r))-e-1)/(1-e-1)
Limited investigation resources are targeted on the specific claims
most likely to be an issue.
8. Network analysis can reveal relationships that warrant
further investigation.
Collusion between members
and suspect providers?
CAS ADP Team 8
Problematic providers tend to
have customers in common.
9. Claiming patterns for narcotics are of particular
interest.
CAS ADP Team 9
Highly concentrated business
relationships are flagged.
Are members seeking narcotics from
multiple doctors and pharmacies?
10. Machine learning (ML) = architectures for building
algorithms that learn.
CAS ADP Team 10
mA
SVM
Random
Forest
NN
Neural
Network
CNNDBN
Deep Learning
RBM
K-NN
RNN
Machine Learning
11. Random Forest algorithm classifies observations based
on the majority vote of many decision trees.
Risk classification
…
1200 obs
7 vars
Sample
with
replacement
Sample
with
replacement
Sample
with
replacement
11CAS ADP Team
12. Random Forest technique shows promise in predicting
which investigations will yield findings of note.
1 0
1 25 9
0 1 50
True Positive Rate: 74%
True Negative Rate: 98%
CAS ADP Team 12
RiskMA(i,j) = (e-(MA(i,j)/Max
MA
(i)) * (1-di(j)/r))-e-1)/(1-e-1)
13. Random Forest provides a measure of a variable’s
importance to classification success.
Var 6
Var 2
Var 3
Var 4
Var 1
Var 5
Var 7
CAS ADP Team 13
14. Automated review of receipts provides early detection
of potential issues.
Machine learning algorithm is being used to
determine if the document is a valid receipt.
Data lift technology extracts
the information.
15. Analytics is one input used to match cost-to-investigate
with the anticipated ROI.
15CAS ADP Team
16. There are many paths to generating ROI from
fraud detection analytics.
– Business knowledge and a
willingness to learn are more
important than the tool set
– Analytics are tools; keep them
sharp
– Verify that the analyses are:
– Relevant
– Reliable
– Responsible
– Tailor audit investigations to
the nature and magnitude of
the risk
16CAS ADP Team
17. Jil Tanguay, BSc (Spec), CFI, CRMA
Manager
Claims Assurance Services
Alberta Blue Cross
jtanguay@ab.bluecross.ca
Darryl Humphrey, PhD, PMP
Senior Data Scientist
Claims Assurance Services
Alberta Blue Cross
dhumphrey@ab.bluecross.ca
Yemi Dare-Ode, BSc
Nazanin Tahmasebi, PhD
Wesley Wood, Bsc
17CAS ADP Team
19. Many data sets contain nonlinear relationships which can
reduce the effectiveness of some detection methods.
– Datasets that are linearly
separable with some noise work
out great
0 x
0 x
0
x2
x
– Some data sets aren’t linear in
their initial state
– The data can be mapped to a
higher-dimensional space
19CAS ADP Team
20. Map feature space to one of higher dimensionality
where the training set is linearly separable.
Φ: x → φ(x)
20CAS ADP Team
21. Support Vector Machines find the
optimal surface that separates the
groups.
– Maximizes the distance between the
hyperplane and the “difficult points”
close to decision boundary
– If there are no points near the decision
surface, then there will be fewer false
positives and false negatives
– Support vectors are the observations
near the decision boundary that
contribute to determining the boundary.
– Implies that only support vectors matter;
other training examples are ignorable
Ch. 15
21CAS ADP Team
23. – Artificial neural networks are
composed of multiple nodes
which imitate neurons of the
human brain.
Neural networks are well-suited to detection tasks.
– Neurons are connected by links
and they interact with each
other. Each link is associated
with a weight
– Artificial neural networks learn
by modifying the weights in
response to feedback
– Deep learning = lots of hidden
layers
– Most often used for images
23CAS ADP Team
24. Eye movement research indicates that we recognize
objects by extracting features.
CAS ADP Team 24
25. The series of layers between input & output do
feature extraction and processing in stages, just as our
brains do.
CAS ADP Team 25
Learning
Variables
26. Network analysis is used to show the effect of
ownership on a pharmacy’s claiming behavior.
– Assertion is that company policy /
implicit guidelines can drive
claiming behavior across the
pharmacies owned by a single
corporate entity
– Network defined by pharmacies
registered with the same legal name
– Red = high total $ from risky claiming
relative to other pharmacies
– Large = high proportion of a
pharmacy’s $ from risky claiming
– Close together = similar high total $
at risk
26CAS ADP Team