2. Technology can never be separated from human needs
and desires. Innovation can create space for new
products and services. With it comes the space for
unethical activities and consequences. Unless we
prepare ourselves to protect our data and human
beings, the end of the world is not far…this end is not
only physical but also emotional, financial and social
end!
UMASREE RAGHUNATH
3. How Machine Learning
Facilitates Fraud Detection?
Machine learning has been instrumental in solving some of the important business problems such as
detecting email spam, focused product recommendation, accurate medical diagnosis etc.
The adoption of machine learning (ML) has been accelerated with increasing processing power, availability
of big data and advancements in statistical modeling
Fraud management has been painful for banking and commerce industry. The number of transactions has
increased due to a plethora of payment channels – credit/debit cards, smartphones, kiosks.
At the same time, criminals have become adept at finding loopholes. As a result, it’s getting tough for
businesses to authenticate transactions
Data scientists have been successful in solving this problem with machine learning and predictive analytics.
Automated fraud screening systems powered by machine learning can help businesses in reducing fraud.
4. UNDERSTANDING MACHINE
LEARNING FOR FRAUD DETECTION
Machine learning is the science of designing and applying algorithms that are able to learn things from past
cases. It uses complex algorithms that iterate over large data sets and analyze the patterns in data.
The algorithm facilitates the machines to respond to different situations for which they have not been explicitly
programmed. It is used in spam detection, image recognition, product recommendation, predictive analytics etc.
Significant reduction of human effort is the main aim of data scientists in implementing ML. Even with modern
analytics tools, it takes a lot of time for humans to read, collect, categorize and analyze the data.
Machine Learning converts data intensive and confusing information into a simple format that suggests actions to
decision makers. A user further trains the ML system by continually adding data and experience. Thus at its core,
machine learning is a 3-part cycle i.e. Train-Test-Predict. Optimizing the cycle can make predictions more accurate
and relevant to the specific use-case.
5. WHY SHOULD WE USE MACHINE
LEARNING IN FRAUD DETECTION?
1. Speed – In rule-based systems, people create ad hoc rules to determine which types of orders to accept or reject. This
process is time-consuming and involves manual interaction. As the velocity of commerce is increasing, it’s very important
to have a quicker solution to detect fraud.
2. Scale – Machine learning algorithms and models become more effective with increasing data sets. Whereas in rule-
based models the cost of maintaining the fraud detection system multiplies as customer base increases. Machine-learning
learning improves with more data because the ML model can pick out the differences and similarities between multiple
behaviors. Once told which transactions are genuine and which are fraudulent, the systems can work through them and
begin to pick out those which fit either bucket. These can also predict them in the future when dealing with fresh
transactions. There is a risk in scaling at a fast pace. If there is an undetected fraud in the training data machine learning
will train the system to ignore that type of fraud in the future.
3. Efficiency – Contrary to humans, machines can perform repetitive tasks. Similarly, ML algorithms do the dirty work of
data analysis and only escalate decisions to humans when their input adds insights. ML can often be more effective than
humans at detecting subtle or non-intuitive patterns to help identify fraudulent transactions.
6. HOW TO DETECT FRAUD USING
MACHINE LEARNING?
Fraud detection process using machine learning starts with gathering and segmenting the data. Then machine
learning model is fed with training sets to predict the probability of fraud.
Extract Data
The data will be split into three different segments – training, testing, and cross-validation. The algorithm will be
trained on a partial set of data and parameters tweaked on a testing set. The performance of the data is measured
using cross-validation set.
Provide Training sets
The main application of machine learning used in fraud detection is the prediction. We want to predict the value of
some output (in this case, a boolean value that is true if the payment is fraudulent and false otherwise) given some
input values. The records are often obtained from historical data.
7. Building Models
Building models is an essential step in predicting the fraud or anomaly in the data sets. We determine
how to make that prediction based on previous examples of input and output data. We can further divide
the prediction problem into two types of tasks:
1. Classification
2. Regression
1. Logistic Regression
Regression analysis is a popular, longstanding statistical technique that measures the strength of cause-and-
effect relationships in structured data sets. Regression analysis tends to become more sophisticated when
applied to fraud detection due to the number of variables and size of the data sets. It can provide value by
assessing the predictive power of individual variables or combinations of variables as part of a larger fraud
strategy.
8. Decision Trees
2. Decision Tree
This is a mature machine learning algorithm family used to
automate the creation of rules for classification tasks. Decision
Tree algorithms can use for classification or regression predictive
modeling problems. They are essentially a set of rules which are
trained using examples of fraud that clients are facing. The
creation of a tree ignores irrelevant features and does not require
extensive normalization of the data. A tree can be inspected and
we can understand why a decision was made by following the list
of rules triggered by a certain customer. The output of the
machine learning algorithm might be a model like the following
decision tree. This gives a probability score of fraud based on
earlier scenarios.
9. Random Forest
3. Random Forest
Random Forest technique uses a combination of multiple decision trees to
improve the performance of the classification or regression. It allows us to
smooth the error which might exist in a single tree. It increases the overall
performance and accuracy of the model while maintaining our ability to interpret
interpret the results and provide explainable scores to our users. Random forest
runtimes are quite fast, and they are able to deal with unbalanced and missing
data. Random Forest weaknesses are that when used for regression they cannot
predict beyond the range in the training data and that they may over-fit data sets
sets that are particularly noisy. Of course, the best test of any algorithm is how
well it works upon your own data set.
10. Neural Networks
4. Neural Networks
It is an excellent complement to other techniques and
improves with exposure to data. The neural network is
a part of cognitive computing technology where the
machine mimics how the human brain works and how
it observes patterns. The neural networks are
completely adaptive; able to learn from patterns of
legitimate behavior. These can adapt to the change in
the behavior of normal transactions and identify
patterns of fraud transactions. The process of the
neural networks is extremely fast and can make
decisions in real time.
11. LIMITATIONS OF USING MACHINE
LEARNING FOR FRAUD DETECTION
Machine learning is not a panacea for fraud detection. It is a very useful technology which allows us to find
patterns of an anomaly in everyday transactions. They are indeed superior to human review and rule-based
methods which were employed by earlier organizations. But this technique of fraud detection has its own
limitations:
1. Lack of inspectability- Even the most advanced technology cannot replace the expertise and judgment it
takes to effectively filter and process data and evaluate the meaning of the risk score
2. Cold start -there must be enough data points to identify legitimate cause and effect relationship.
3. Blind to connections in data- Machine learning models work on actions, behavior, and activity. Initially,
when the dataset is small, they are blind to connections in data. The model can overlook a seemingly
obvious connection such as a shared card between two accounts.
12. Example:Machine Learning
Solution Reduces Check Fraud
Counterfeiters constantly develop new
techniques to perpetrate fraud in financial
services. The AI solution operates with near
human intelligence to counteract the
counterfeiters and reduce losses. Every
transaction the model processes increases its
accuracy of detection and adds to its
enormous repository of historical information,
so it’s continually learning the practices of
habitual fraudsters to defeat them.
13. AI Machine Learning Solution Detects
Check Fraud for a Global Bank
The Challenge
Even with lower check-processing times due to electronic payments and automated clearinghouse
(ACH) transactions, banks must still manually verify millions of handwritten checks. Annually, banks risk
risk losing millions as a result of check fraud by counterfeiters. Because a percentage of the funds is
made readily available to the depositors, it’s critical to identify counterfeit checks quickly. To reduce
the incidence of check fraud, a global bank partnered with Cognizant Digital Business to build a
solution based on artificial intelligence (AI) machine learning to speed up check verification and lower
costs.
14. The Approach
To meet the bank’s goals, our solution needed to identify fraudulent checks in real time, as well as
reduce the number of checks requiring manual review. The bank already uses optical character
recognition (OCR) and deep learning technology to scan checks, process data and verify signatures.
Our model, based on Google TensorFlow™, uses a neural network to parse a historical database of
previously scanned checks, including those known to be fraudulent.
Banking AI solutions experts trained the neural network to use a set of comparative algorithms to
distinguish good checks from anomalous ones. By automatically comparing various factors on scans of
deposited checks to those in the database, our model flags potential counterfeits in real time. It
assigns a confidence score to each scanned check, flagging it as good, fraudulent or needing further
review. Our solution is scalable and configurable to the client’s evolving needs.
15. 5 Keys to Using AI and Machine
Learning in Fraud Detection
Key 1 — Integrating Supervised and Unsupervised AI Models in a Cohesive Strategy
Key 2: Applying Behavioral Profiling Analytics in Fraud Detection
Key 3: Distinguishing Specialized from Generic Behavior Analytics
Key 4: Leveraging Large Datasets to Develop Models
Key 5: Adaptive Analytics and Self-Learning AI
16. Blockchain for fraud prevention:
Industry use cases
Fraud — and the lack of transparency that enables it — is a growing problem for businesses around
the world. Yet, a cutting-edge technology called blockchain could provide the fraud prevention
capabilities these businesses are looking for.
Blockchain is a shared ledger that is decentralized and resistant to tampering. It allows verified
contributors to store, view and share digital information in a security-rich environment, which helps
to foster trust, accountability and transparency in business relationships. Seeking to capitalize on
these benefits, companies have been exploring ways to use blockchain technology to prevent fraud in
industries such as finance, identity management and supply chain.
17. Preventing identity fraud
The threat of online fraud has spurred many credit card companies and financial institutions to alert
consumers when potentially fraudulent transactions are made. However, that hasn’t stopped criminals
from stealing identifying information and using it without permission.
What if a person’s digital identity could be secured in a way that prevented it from being tampered with or
used in an unsanctioned way? Blockchain could make this possible.
If identifying information is placed on a permissioned blockchain framework, authorized parties will have
access to one version of the truth, and only known participants can verify transactions and ensure records
are valid.
New digital identity management and attribute sharing network based on blockchain. The network will
allow individual consumers to control what information they share, while organizations can efficiently
validate a customer’s identity and arrange new services.
18. Can blockchain eliminate all
fraud?
Blockchain can help to reduce and even prevent fraud in the supply chain through greater
transparency and improved traceability of products. It’s very difficult to manipulate the blockchain,
which is an immutable record that can only be updated and validated through consensus among
network participants. And if a product is digitized on blockchain, it can easily be traced back to its
origin because the information is on shared, distributed ledger.
No, blockchain doesn’t eliminate all types of fraud. Even with blockchain in place, thefts can still
occur. However, those thefts are due in large part to attempts to layer services on top of blockchain,
and not because of the core technology. And while blockchain networks are built on the notion of
decentralized control, the infrastructure can leave back doors open to vulnerabilities that allow
tampering and unauthorized access
19. Online Banking Customers: Real-
time fraud detection in the cloud
From a technology architecture perspective, a cloud-based ecosystem can enable users to build an
application that detects, in real time, fraudulent customers based on their demographic information and
prior financial history. Multiple algorithms help detect fraud, and the output is aggregated to improve
prediction accuracy.
A system that allows the development of applications capable of churning out results in real-time needs
multiple services running in tandem and is highly resource intensive. By deploying the system in the cloud,
maintenance and load balancing of the system can be handled efficiently and cost effectively. In fact, most
cloud systems function as “pay as you go” and only charge the user for actual usage vs. maintenance and
monitoring costs. “Intelligent” cloud systems also provide recommendations to users to dial up/down
resources available to run the fraud detection algorithms without worrying about the data-engineering
layer.
Since multiple algorithms are run on the same data to enable fraud detection, a real-time agent paradigm
is needed to run the algorithms
20. But Why Use the Cloud?
Fraud detection can be improved by running an ensemble of algorithms in parallel and aggregating the
predictions in real time. This entire end-to-end application can be designed and deployed in days
depending on complexity of data, variables to be considered and algorithmic sophistication desired.
Deploying this in the cloud makes it horizontally scalable, owing to effective load balancing and hardware
maintenance.
It also provides higher data security and makes the system fault tolerant by making processes mobile
This combination of a real-time application development system and cloud-based computing enables even
non-technical teams to rapidly deploy applications.