DutchMLSchool 2022 - Multi Perspective Anomalies

NYENRODE. A REWARD FOR LIFE
NYENRODE. A REWARD FOR LIFE
Modular MBA - AISEC - Artificial Intelligence and Security

This fall at Nyenrode!
Welcome at Nyenrode

Jan W
. Veldsink MSc STRICTLY CONFIDENTIAL
July 4 - 6, 2022
2 n d E d i t i o n

#DutchMLSchool
Multi perspective Anomaly detection
Why we need Perspectives
Jan Veldsink Msc

CAIO Grio

Jan W
T0 BE SUCCESSFUL
“TO BE TRULY SUCCESSFUL, INFORMATION MUST FLOW
THROUGHOUT THE ORGANIZATION. WORKERS' IDEAS AND
KNOWLEDGE ARE CONVEYED TO ALL LEVELS OF THE
COMPANY, AND THE ORGANIZATION IS FULLY RESPONSIVE.

IN COMPANIES WHERE THIS KIND OF INTERNAL
COMMUNICATION SYSTEM IS IN PLACE, PRODUCTIVITY,
QUALITY, AND CUSTOMER SERVICE IMPROVE.”

(DEMING,1982)

Jan W
Predictive Services vision
• Every initiative is going to involve AI

• Every initiative is driven by DATA

• Every initiative is MODEL BASED

•Every initiative is realised NON CODING

Jan W
Business
requirement
Design and analyse model
Architecture
Take it to a achitectual level
AI step
Model creation and

Machine Learning
Feedback
Evaluation of Rules / ML / AI

Feedback
Event
Signals and Business events
Signals
Signal

Optimization
Action
Any down

stream action
Detection
Applying task specific AI
Value creation Value creation
Know
ledge
crea
tion
Knowledge
creation
Value Chain
Alerts
Analysis

Jan W
Model driven and NON-CODING

Jan W
2016: Artificial Intelligence positioning paper

Jan W
Why BigML
Auditability
Repeatability

Jan W
Artificial Intelligence
What is the definition?
“Artificial intelligence systems are software (and possibly also hardware) systems designed by
humans that, given a complex goal, act in the physical or digital dimension by perceiving their
environment through data acquisition, interpreting the collected (structured or unstructured) data,
reasoning on the knowledge or processing the information derived from these data, and
deriving/deciding the best action to take to achieve the given goal.” (EBA)
How does it work?
AI is let learned by the Decision Engineer to perform cognitive tasks, these tasks can vary in the
degree of autonomy. The creation-cycle is depicted in the ‘Ecosystem’.
To what is it used?
“We believe AI presents strong opportunities for prosperity and growth, both for society, and
specifically for financial services. The application of AI will be in the interest of consumers and
businesses, providing better, faster products and services, providing relevant or right information
at the right time. One of the many key advantages is better risk management as advanced data
analytics contributes to a better internal understanding of bank activities (e.g. provisioning and
capital models), operational risks and improved monitoring of compliancy.” (NVB)
What is the technology behind it?
Statistics, Machine learning, Logic programming, Linear programming, Knowledge based systems,
Autonomous intelligent Agents, Cybernetics / Computational intelligence and soft computing.
Related technologies
Robotica, Internet of Things, Robotics Process Automation
Ethical ‘Compass’ for AI
(Rabobank Compass aligned:)
Societal well-being
We act in the interest of our customer and we will
protect these interest where we can;
Respect for human autonomy
We have respect for the autonomy of the
individual;
Fair and explicable
We are ethical, transparent and approachable;
Justiceship
We aim at equal treatment. We will not treat two
persons with the same need / interest in a
different way.
Authors (Compliance cq. legal): Patty Braam-Liu, Sander Smits, Martijn Duijvestein.
Society
Possibilities Artificial Intelligence
Ecosystem Transparant AI principles
A model is explainable when it is possible to generate explanations that allow
humans to understand (i) how an outcome or result is reached or (ii) on what
grounds the result is based (similar to a justification).
The model is interpretable, since the internal behaviour (representing how the
result is reached) can be directly understood by a human.
Fairness requires that the model ensures the protection of groups against unfair
bias (direct or indirect), discrimination and stigmatisation. Unfairness can affect in
particular smaller populations and vulnerable groups.
Bias is an inclination of prejudice towards or against a person, object, or position.
Bias (or biased outcomes) can occur in many ways and must be avoided at all times.
All the steps and choices made throughout the entire data analytics process need to
be clear, transparent and traceable to enable its oversight. This includes, amongst
others, model changes, data traceability and decisions made by the model.
A auditable solution, for which there are detailed audit logs for all phases of the
process that can be used to identify ‘who did what, when and why’, facilitates
oversight of the system, as it makes it possible to follow the whole process and gain
better insights.
As a general principle, the customer should be informed about any data processing
performed on his or her personal data. All data processing must be on lawful
grounds and be protected with proper technical and organisational measurements.
A trustworthy system should respect customers’ rights and protect their
interests. Development and deployment of systems must therefore always comply
with law and not harm or diminish consumer rights.
Regulatory aspects
Non-discrimination – Article 1 of the Constitution (Grondwet),
European Convention on Human Rights (Europees Verdrag voor de
Rechten van de Mens) , Act on equal treatment.
Privacy – Profiling, transparency, automated decision making,
lawfulness, explainability, data minimization, accountability
TCF - Duty of care, comprehensibility, complaints procedure.
pm. Ethics: Moresprudentie in Commissie Ethiek also plays a role.
Date: 30/6/2020, v.0.5
AI helps financial inclusion by e.g.
creating access to credit for people
and businesses- that are currently
shut out of the market - at the same
or lower risk costs, made possible by
considering new data sources
Compliance angles
There are potential privacy issues with the use of AI. Especially since
data that AI often applies profiling and automated decision making.
Furthermore, the transparency and explanation of the use of AI
models requires extensive and continuous in dept-knowledge.
Due to the automated nature of AI the human factor of COI is
mitigated. However, there can still exist a COI between the outcomes
of AI and the vision and standards within the bank.
The use of AI can have significant impact on the detection of corruption
The use of AI can positively contribute to for example to automate
alert triage, investigation and reporting. Also: to deploy holistic
market-surveillance activities.
The use of AI can have significant positive impact on the detection of
money laundering, financing of terrorism and sanctions
(Prospect) Customers are exposed to various risks in Treating Clients
Fairly with the use of AI because of the uncertainty of the outcome of AI
in relation to the irrational human brain.
Often AI is related to ‘blackbox’ technologies in which decisions can have
a negative impact on certain customers or business partners.
Discrimination, exclusion etc.
The use of AI can have significant impact on the detection of fraud
Ethics
The standpoint of EU’s High Level Expert
Group on AI
Supervisors Initiatives
The standpoints
General Data Protection Regulation (EC)
Regulation on framework for free flow of
non-personal data in EU (EC)
White Paper on AI – A European approach
to excellence and trust (EC).
DNB - General Principles for the use of Artificial
Intelligence in the Financial Sector
EBA - Final Report on Big Data and Advanced Analytics
Autoriteit Persoonsgegevens - Toezicht op AI &
algoritmes
FSB - Artificial intelligence and machine learning in
financial services. Market developments and financial
stability implications (Financial Stability Board)
EU framework on algorithmic accountability and
transparency (study of European Parliament).
Sector guidelines
Institute of International Finance
Machine Learning in Credit Risk (IIF)
Legislation EU/NL
The current EU regulatory division
Ethics Guidelines for trustworthy AI
Discrimination
Biases
Governance
PublicOpinion
Transparency
Trust
Privacy
CLR & Technology Guild
Customers
More personalized
and better products
with more efficient
and customer
focused services
Rabobank
More and more
insights in
efficiency, risks
and compliancy

Jan W
What we evaluated
What Evaluated Verdict
Thetaray Anomaly detection Too di
ffi
cult to handle. Limited
capabilities
Not useable
Microsoft ML ML and cloud platform Platform for datascience, limited
intelligence
Not usable
Riskshield Risk engine knowledge based Fit for purpose, the platform is for
execution of rules and logic
No Machine learning
DataRobot Datascience robot platform Looks promising automated ML and
model validation claims, supervised and
unsupervised
Was part of the RFP
H2O ML application Good and limited set of algorithms Is too technical and limited
R/Phyton/Weka/.. own development ML open source applications Open to all the hard work, depending on
knowledge and skills
Is too technical
BigML Integral platform, supervised /
unsupervised and unstructured
Easy to learn, great visuals, integration
with Riskshield
RFP and POC
RapidMiner Platform for data scientists Hard to learn Not usable

DataIku Platform to support Datascientist Too technical Not useable
SAS Part of a large suite Very large implementations, data science
oriented tooling
Not useable
IBM/SPSS Used it for years Not
fi
t for new tasks Not useable

Jan W
MAR APR MAY JUN JUL
Milestone
Implemented BigML on
Rabobank hardware
Milestone
First results meeting
Milestone
Endreport
RFP
Installation
Education
Experiments 1
Experiments 2
Milestonse
Hardware plan
HardwarePlan
Reporting
POC BigML-timings

Jan W
MAR APR MAY JUN JUL
Milestone
Implemented BigML on
Rabobank hardware
Milestone
First results meeting
Milestone
Endreport
RFP
Installation
Education
Experiments 1
Experiments 2
Milestonse
Hardware plan
HardwarePlan
Reporting
POC BigML-timings
Today
Hardware Implementation

Jan W
BigML - first results
Environment/Experiment True positives False positives

Thetaray 10% 5000 / 1.000.000
R / Weka experiments 70% 350 / 1.000.000
Microsoft 70% 150 / 1.000.000
BigML 90% 5 / 1.000.000
•Fraud
• Ability to work with an predict fraud with our 2017 anonymized dataset.
(Thetaray / Microsoft / Inform / CCR and Dataiku did not reach the
results BigML did)
 
• CDD
• Reduction of 85% of the PEP alerts

Jan W
Indonesie Card_fraud

Jan W
Fraude_set

Jan W
Topics

Jan W
Projects in BigML / Machine Learning Platform
DataDrift detection Agri Default Prediction
Fraud detection
CDD - Anomalies
Data driven Audit
False Positive Reduction
Classification Documents
KYC - Anomalies

Jan W
ML DataLake
AI - Platform Roundtrip
AI - ML-DataLake AI - Machine learning AI - ML- Actuation
Feedback

Jan W
Lets go to  
the  
Dark Side

Jan W
Dirty Money

Jan W
FEC AND KYC the proper way

Jan W
Dynamic Customer Behavioral Event Monitoring
It is a wicked problem
Dynamic Customer Behavioral
Event monitoring
to manage and mitigate risks for
Bank’s customers, the Financial
system, Own organization
including:  
Fraud, Misuse of the financial
system and Financial Economic
Crimes.
Project X
Project
RDT
Project
TM
Project Case
management
Project Data Lineage
WWFT
DNB-Guidelines
EBA-Guidelines
Org-Policies
Org-Standards
Org-Politics
Project
Prospero
Project Indica
Distinguish between Critical and  
‘Safe to fail’ projects.

Jan W
Our Customers

Jan W
Can we understand behavior
Maslow

Jan W
In Perspectives

Jan W
Conceptual model KYC

Jan W
Muli perspective
Anomaly
Customers path of life
Customers age
Customers industry segment
Customers Cash
intensiveness
Customers Relation to
High risk counties
Customers Newly
 
Onboarded

Jan W
Some features > 400 :-) different sets per Perpective
Cash related
Country related

Jan W
Automated - flows

Jan W
The Peergrouping Anomaly pattern

Jan W
Perspective New Customers

Jan W
AI Architecture for Fraud / KYC

Jan W
Some Numbers on a weekly basis we
Build client profile in Riskshield
Anomaly detection in RaboML
 
Powered by BigML
1 hours
3 hours
Number of weekly Risk-indicators
 
=> (Y/N) and explanation
63.000.000
Assess # Customers > 10.000.000

Jan W
Anomaly detection

Jan W
Isolation forest because it is the best!

Jan W

Jan W
Based upon BIGML’s explanations

Jan W
Fairness of Assessment

Jan W
Predicting with SensitiveAttributes
Dataset with
Two attributes:
- Anomaly Score
- Gender Attribute
Split 80%
Train
Split 20%
Test
Build Tree model
Target = GENDER
Evaluate model
Test
If Phi > 0.2 then
BIAS!!
Trained
Model(tree)
Data without Bias
attribute GENDER
Anomaly
Detector
Data with Anomaly
Score
Batch
Anomaly Score

Jan W
The result from actual scores

Jan W
Detecting covariate drift
Data

Period 1
Data

Period 2
Sample and

add field Period
with value P1
Sample and

add field Period
with value P2
Merge
Train 80%
Test 20%
Build model

Target is Period
Missing splits = True
Trained model
Evaluate Check if phi > 0.2
Investigate

Jan W
Detecting covariate drift
TRIM

AIRFLOW
(define a-list(resource-ids (list-projects {"name__contains" PeergroupClass})))

(define p-list(resource-ids (list-datasets {"project_id" (last (parse-resource-id (a-list 0))) "name__icontains" PeergroupName})))

(define p1-list(resource-ids (list-datasets {"project_id" (last (parse-resource-id (a-list 1))) "name__icontains" PeergroupName})))

(define tomerge(list (p1-list 0) (p-list 0)))(define mds (merge-datasets tomerge {"project_id" (a-list 0)}))

(define ids (create-random-dataset-split mds 0.8 {"name" (str PeergroupName "_DataDrift_80%")} {"name" (str PeergroupName "_DataDrift_20%")} ))

(define train-id (nth ids 0)) ;; 80% for training

(define test-id (nth ids 1)) ;; and 20% for evaluations

(define model-id (create-and-wait-model {"dataset" train-id "objective_field" "IDVAR_WEEKNO" "project_id" (a-list 0)})) ;; create a model

(define evaluation-id (create-evaluation model-id test-id));; evaluate(define evaluation-map (fetch (wait evaluation-id)))

(define Phi_Coefficient (evaluation-map ["result" "model" "average_phi"]))
execution = api.create_execution("script/61c42897c17b416d8e00004a", {"inputs": [["PeergroupClass","PEERGRP_0001_CLASS" ],["PeergroupName","Pmml_16_ORG_07"]]})

api.ok(execution)

from bigml.execution import Execution

local_execution = Execution(execution)

print("outputs: ", local_execution.outputs)

Jan W
Did it Work?????
Phi-Coefficient = 0,347 > 0.2 Why: Countries on a list.

Jan W
Did it Work?????
Phi-Coefficient = 0,4999 > 0.2 DRIFT! Why: SBI-class

Jan W
Business / Information
PERSPECT: Simpel
Development
Operation
D
o
m
a
i
n
A
c
t
i
v
i
t
y
C
a
s
e
P
a
r
t
y
Vision
Development
Operation
Analysis
D
o
m
a
i
n
A
c
t
i
v
i
t
y
C
a
s
e
P
a
r
t
y
Infrastructure
Application
Information
Business
Services
Proces
Scope

Jan W
In a Journey towards resilience

Jan W
. Veldsink MSc
MULTI PERSPECTIVE
ANOMALY DETECTION

Jan W Veldsink MSc

jan@grio.nl / j.veldsink@nyenrode.nl

DutchMLSchool 2022 - Multi Perspective Anomalies

Recommended

Recommended

More Related Content

Similar to DutchMLSchool 2022 - Multi Perspective Anomalies

Similar to DutchMLSchool 2022 - Multi Perspective Anomalies (20)

More from BigML, Inc

More from BigML, Inc (20)

Recently uploaded

Recently uploaded (20)

DutchMLSchool 2022 - Multi Perspective Anomalies