Looks at the different AI approaches and provides some practical categorisation and case studies. Then talks about the data fabric you need to put in place to improve model accuracy and deployment. Covers: supervised, unsupervised, machine learning, deep learning, RPA, etc. Finishes with how to create successful AI projects.
Aspirational Block Program Block Syaldey District - Almora
Artificial Intelligence Primer
1. Public | Quantexa® 2018
Aug 2018
Imam Hoque
AI Primer
Version 1.0
A business perspective on AI
2. Public | Quantexa® 2018
AI – what is the definition?
Artificial
intelligence
Data Mining
Machine
Learning
Deep Learning
Supervised
Unsupervised
NLP Robotics
Genetic
Programming
Artificial
Neural
networks
Expert
systems
Voice
recognition
Definition of ARTIFICIAL
INTELLIGENCE
1.1: a branch of computer
science dealing with the
simulation of intelligent
behavior in computers
2.2: the capability of a
machine to imitate intelligent
human behavior
Mirriam Webster
Turing Test
A test for intelligence in a computer,
requiring that a human being should be
unable to distinguish the machine from
another human being by using the replies to
questions put to both.
3. Public | Quantexa® 2018
AI is coming of age and there will be a revolution…
3
TRADITIONAL
1920 - 2005
PRODUCT
HARDWARE
INSIGHT
RISK
COMPLIANCE
CUSTOMERS
$$
Traditionally, organisations were
very people intensive, limiting the
products possible and customers
that could be acquired.
DIGITAL CHANNELS
2005
PRODUCT
SELF
SERVICE
HARDWARE
INSIGHT
RISK
COMPLIANCE
CUSTOMERS
$$
The advent of the Internet drove
adoption of digital channels,
reducing customer servicing
headcount and providing more
data for customer insight. Scope
for more products and customer
acquisition growth.
DATA, CLOUD, Robotic Process
Automation (RPA)
2015
PRODUCT
SELF
SERVICE
INSIGHT
RISK
COMPLIANCE
RPA
CUSTOMERS
3rd Party Data HARDWARE
$
Availability of more 3rd party data,
compute capacity and SaaS in the
cloud further optimises a range of
back office functions. This is
enhanced through quick win
tactical RPA deployments.
AI AUTOMATED DECISIONS
PRODUCT
SELF
SERVICE
2017
INSIGHT
RISK
COMPLIANCE
CUSTOMERS
3rd Party Data HARDWARE
$
. Global competition, Internet giants and
governments driving open standards for
competition means organisations have to
drastically cut costs to compete. The use of data
and context driven AI automated decisioning will
be the new “white collar worker revolution”.
Allowing better customer insight, more
products/customers and better compliance.
4. Public | Quantexa® 2018
Example supervised approaches
4
• Classic “old school out of uni”
techniques were commonly used for:
• Credit Risk
• Claims Fraud
• Traditionally trained offline and
deployed as a fast scorecard
• Recent extensions are continuous re-
fitting / training
Regression
• Uses “buckets” to categorise
observations by a range of their
variables. Useful for predicting
behaviours and detecting fraud
Decision trees
• Simulates “the brain” through a series
on neural nets
• It is trained on data sets and outcomes
and will “learn”
• Multiple features and patterns within
the data can be recognised and it will
be able to classify new events as they
occur
• E.g. credit card authorisations
Artificial Neural Networks
Others: Genetic programming, Support vector machine, random forests, ensemble approaches, etc.
5. Public | Quantexa® 2018
Example unsupervised approaches
5
• Rules are created by “talking to
experts” or exploring the data
• Still powerful as it captures and
applies the knowledge of experts and
does not large volumes of known
outcomes
• Used extensively, but is easy to second
guess – hence the challenges in AML
and finCrime applications
Rules / expert systems
• Uses the data itself and a set of
variables to cluster or segment the
data
• Great for spotting needles in haystacks
where there are few known outcomes
• Uses in internal fraud, surveillance and
recommendation engines which are
not fully personalised
Clustering / peer groups
• Analyse data over a series in times –
usual “back on itself”
• Can spot new trends and changes in
behaviour
• Identify a build up to an event:
purchase imminent, cyber attack or
rogue trade
Time series analysis
Others: Outlier analysis, PCA & dimensional reductions, etc.
6. Public | Quantexa® 2018
Natural language processing
6
• Understanding meaning
• In its simplest form entity extraction
• But can also include “RDF triples”
• Typical use cases:
• Surveillance (e.g. trader)
• Call centre optimisation
• Online chat-bots
• Forensic examinations
Semantic extraction
• Takes a voice stream and converts it to
a text file
• Challenge is the 3 vectors: quality out,
context & sound quality
• Many applications as quality improves
• Key use cases:
• Trader surveillance
• Regulatory adherence
• Call centre disputes
Voice to text
• Takes input in one language and
translates to a second language
• Is being dominated by Google and is
best consumed as a cloud service
Translation
Others: Audio searching / categorising, speech generation, etc.
7. Public | Quantexa® 2018
So what is machine learning and deep learning?
7
• Classically supervised techniques: Key
is that it does not require the
algorithms to be manually developed
by specialist programmers
• Looks for patterns: ideally suited to
identify patters and using them for
predictions
• Not limited to supervised: can also
leverage unsupervised techniques
• Not just artificial neural networks:
other supervised techniques are used
Machine learning
• Heavy reliance on neural networks:
These tend to be at the heart of the
solution
• Used in supervised and unsupervised
modes: the latter helps to discover
patterns or features
• Has many layers: Claim to understand
concepts or features
• Large data applications: typically
requires a lot of data
• Has been criticised as being too
“black box”: some applications this is
an issue and will it become
unpredictable?
Deep learning
Evolution or
“rebranding”
8. Public | Quantexa® 2018
And what about Robotic Process Automation (RPA)?
8
• Taking effort out of
collecting data from
multiple systems without
impacting them
• Fairly primitive
• Did not typically interact
back with the underlying
system
Screen scraping
Add direct
automation
• New technologies make it easier to
scrape screens
• But also allow automated form filling
and button pressing
• No impact on the underlying systems
• Quick short term fix to reduce
headcount needs for repetitive simple
work
• System is trained by humans as oppose
to programmers (AI may be used to
help it learn what buttons to press)
• Can be supplemented by rules
• Never gets tired, works 24/7, hugely
consistent
Robot Process Automation (RPA)
FUTURE:
Straight
through
processing and
AI driven
automated
decisions
9. Public | Quantexa® 2018
Why is AI hard to get right?
Challenge
Huge volumes, poor
quality and missing
data
Not coherently
connected – poor
customer single views
Dumped into open
source technology
“data lakes”
Does not reflect real
life situations
Opportunity
Automate high quality
single customer views
across the enterprise
Dynamically represent data
as networks at scale to
provide context
Leverage and coexist with
open source “data lakes” as
a single platform
Provide effective AI business
solutions to service multiple
use cases
Artificial
Intelligence (AI) is
failing to deliver
real business value
in AML
“We have 3,500 people investigating
AML alerts – our systems are creating
too many false positives and missing
the real risk – we fear a multi-billion
dollar fine”
Organisations can
better optimise AI-
driven decisions to
unlock the value in
their data
“We have reduced our headcount by
30%, identified more risk and can
better react to a changing regulatory
environment”
“Deep learning alone cannot solve these problems, you need to combine Human Intelligence with Artificial Intelligence”
10. Public | Quantexa® 2018
AI’s not working – diagnostic view
10
• Too few data sources: more data can
help overcome data quality issues,
especially third party sources
• Not enough training data: this is a
classic issue – especially when there
are many different types of outcome.
Often the failing of machine learning
and deep learning
• Data does not represent real life:
missing single views or networks of
relationships
• Data quality: algorithms and
approaches are challenged by poor
data quality
Data challenges
• Unhelpful black box output: “78% risk
of criminality” – where does the
investigator start?
• “Wrong more often than right”:
disenfranchises the end user – false
positives are an adoption killer
• Too complicated: you can lose users if
it does not feel familiar enough
• Manually intensive: system does not
take the drudge work out of the
decision step
User adoption
• Worse than random: Normally an
issue in environments where the
targets have reverse engineered rules
in a rule only system
• Too many false positives: annoys
customers, creates large workloads,
kills effectiveness
• Missing critical events: will often fail
to spot what you are looking for –
coverage issue
Model algorithms
11. Public | Quantexa® 2018
Example within the financial crime or risk domain:
AI Decision Engine
A new paradigm: using context for better decisions
Customer
satisfaction
Costs
Reputational
risk
Event Based Statistics
Traditionally, analytical decision
engines have focused on using
statistics to score events (e.g.
applications and transactions).
Single View Analytics
More recently people are scoring
entities (people, organisations,
vehicles and addresses), however a
key challenge is providing the single
view or “entity resolution” across
internal and external data sources.
Network Analytics
People are starting to realise that the
real world is a series of networks and
a decision must account for the
network (businesses, families,
trading groups, spheres of influence
and organised crime gangs).
Realtime / Dynamic / Adjustable
Not only this, in today’s world the
complex task of consolidating
entities or scoring networks have to
occur in real time as well as batch,
with the ability to adjust matching
and network building criteria.
Entities NetworksEvents
Composite Risk Score
• Event risk A
• Event risk B
• Event risk C
• Entity risk D
• Entity risk E
• Entity risk F
• Network risk G
• Network risk H
• Network risk I
Scenarios and rules
Machine learning
Peer group analysis
Anomaly detection
Searches
Temporal analysis
Text Mining
Coverage
More risk types and
volume detected
Effectiveness
Fewer false positives /
higher accuracy
Efficiency
Reduced decision /
investigation times
12. Public | Quantexa® 2018
Creating a data driven organisation
Quantexa Technology
Business
activities &
data sources
Internal line of business systems Third party & external sourcesExisting decision engines
Customer systems Applications / KYC
Transactions Mid office, etc.
Detection systems Risk ratings
Next best offer Other
DNB /BVD Bureaus
TR / Dow Jones News
Manage Management information Model performance Operational reporting
Data lake layer
Real Time
Batch
Comms
Voice
Identity
Text
Structured
Data Lake
Batch process
data
Dynamic
access data
Analytics use
cases
(AI Models)
Fraud Risk
Next best
offer /
action
New
prospects /
Revenue
Compliance
Conduct /
surveillance
Real Time
Batch
Marketing
Action
Automated Decision
Real time
Batch
User Interaction
Investigation
Decision making
Relationship Mgrs
Contact strategy
Data scientists / Optimisation
Tune models Thematic reviews
New model discovery Special projects / strategy
Scenarios and rules
Machine learning
Peer group analysis
Anomaly detection
Artificial Intelligence
Temporal analysis
Text Mining
AIDecisionEngine
Entity
Resolution
Network
Build Filter
NLP Entity
Extraction
ETL
Corporate
Memory
Search/Visualise
13. Public | Quantexa® 2018
Approach to technology strategy
13
Data lakes
Emergence of NoSQL,
Hadoop, Search, etc.
Constant change
Be prepared for the future
tools – new ones every day!
Data volume
Can be huge, so consider
memory + disk based stacks
Batch / Real time
Need to be able to operate
in real time these days
Granular security
Data protection is becoming
a challenge
Future use cases
Assume a proliferation of AI
applications
Data and the
application of AI will be
critical – you will have
to live and grow with
your decisions
Black box
or
transparentOpen
source
tools
Loosely
coupled
Level of
data
replication
Commodity
scale out
platform
Interoperab
ility
Production
ready
14. Public | Quantexa® 2018
A successful approach to selecting AI projects
14
Philosophy
Have a vision, start piloting early, move
toward the end state in “baby steps”, work
in parallel and deliver results early…
Time
Maturity
AI adoption
programme
Data
available
Significant
labour costs
Significant
potential
impact
Ease of
automation
Proven
examples
Pilot Operate Measure Optimize
Paper
business case
checkpoint
Pilot
validated
business case
Communicate