This document discusses using TigerGraph for real-time fraud detection at scale by integrating real-time deep-link graph analytics with Spark AI. It provides examples of common TigerGraph use cases including recommendation engines, fraud detection, and risk assessment. It then discusses how TigerGraph can power explainable AI by extracting over 100 graph-based features from entities and their relationships to feed machine learning models. Finally, it shares a case study of how China Mobile used TigerGraph for real-time phone-based fraud detection by analyzing over 600 million phone numbers and 15 billion call connections as a graph to detect various types of fraud in real-time.
2. Benyue (Emma) Liu, TigerGraph Inc.
Real-time Fraud Detection at
Scale - Integrating Real-Time
Deep-Link Graph Analytics with
Spark AI
#UnifiedDataAnalytics #SparkAISummit
3. Graph analysis is possibly the single most effective
competitive differentiator for organizations pursuing data-driven
operations and decisions after the design of data capture.”
4. Graph is HOW WE THINK
4#UnifiedDataAnalytics #SparkAISummit
5. Common TigerGraph Use Cases
5
Improve Operational EfficiencyReduce Costs & Manage RisksIncrease Revenue
• Recommendation Engine
• Real-time Customer 360/
MDM
• Product & Service Marketing
• Fraud Detection
• Anti-Money Laundering
(AML)
• Risk Assessment & Monitoring
• Cyber Security
• Enterprise Knowledge Graph
• Network, IT and Cloud
Resource Optimization
• Energy Management System
• Supply Chain Analysis
Analyze all interactions
in real-time to sell more
Reduce costs and assess and
monitor risks effectively
Manage resources for
maximum output
Foundational Use Cases: Geospatial Analysis, Time Series Analysis, AI and Machine Learning
6. 7 Key Data Science Capabilities Powered By a Native Parallel Graph
Deep Link Analysis
Relational Commonality
Discovery and Computation
From a set of entities (e.g. devices,
customers, accounts, doctors), show
all links or connections
Given 2 entities (e.g. customers,
businesses), follow their
relationships to find commonality
6
Multi-dimensional Entity
& Pattern Matching
Given a pattern (e.g. referring
business to a relative), find similar
patterns in the graph
Hub & Community Detection
Find most influential members of a
group (customers, doctors, citizens)
& detect community around them
Community 1
Community 2
1 32 4
5 Geospatial Graph Analysis Analyze changes in entities & relationships with location data
A
C
A
B
Machine Learning Feature
Generation & Explainable AI
Extract graph-based features to feed as training data for
machine learning; Power Explainable AI7
Temporal (Time-Series) Graph Analysis Analyze changes in entities & relationships over time
Query Pattern P
MatchB
D
10. Typical Spark + TigerGraph Integration
● Data Preparation and Integration (TigerGraph/Spark)
● Unsupervised Learning (TigerGraph)
● Feature Extraction for Supervised Learning (TigerGraph/Spark)
● Model Training (Spark)
● Validate and Apply Model (TigerGraph)
● Visualize and Explore Interconnected Data (TigerGraph)
10
12. 12
Real-Time Phone-Based Fraud Detection
Massive, Worldwide Problem
● 18 Billion robocalls in US in 2017 (hiya.com)
● Spam/Scam - agile, spoofed numbers
Customer:
● 600M subscribers
● 300M calls/day, peak 10K calls/sec
● Need: Real-time detection of various
types of phone-based fraud
13. Real-Time Phone Anti-Spam/Scam Detection
13
TigerGraph Solution: Real-time graph-based machine learning and
decision system
Graph Analytics
● Real-time machine learning
○ 118 graph features per call
○ Retrained periodically with
2M calls
● Real-time decisions
○ Call recipient sees alert if
ML system says call is
suspicious
● In production since Dec 2016
Graph Database
● 600M phone numbers
(inside and outside network)
● 15B phone-phone call edges
(2 month sliding window)
○ Time
○ Duration
● Real-time graph updates
Peak 10K+ calls/sec
● 118 graph features per phone
14. Examples of Graph Features for Machine Learning
14
Good Phone
Features
Bad Phone
Features
(1) Short term call
duration
(2) Empty stable group
(3) No call back phone
(4) Many rejected calls
(5) Average distance > 3
Empty stable group
Many rejected
calls
Average
distance > 3
(1) High call back
phone
(2) Stable group
(3) Long term phone
(4) Many in-group
connections
(5) 3-step friend relation
Stable
group
Many in-
group
connections
Good Phone
Features
3-step friend
relation
///
Good phone Bad phone
X
X
X
15. China Mobile - Detecting Phone-Based Fraud by
Analyzing Network or Graph Pattern Features
15
• Each phone node has a fraud flag,
indicating it’s a good phone or a bad phone
and what type of fraud: scam, harassment,
advertisement
• Run real-time GSQL query for each call:
○ Collect 118 features
○ Compute composite score
○ Update fraud flag
○ Return fraud type
Real-Time Call Event
Caller
Callee
Time
Call Detail Records
Caller
Callee
Time
Duration
Query
Continuous
Graph Update
Fraud Type
16. Phone Fraud Real-Time Detection System
phone vertex
- fraud flag
- expiration time
target4
target3
- num of call
- total duration
- call date list
- num of rejection
target2
target1
● 600 Million Vertices
● 15+ Billion Edges
● 300 Million Daily
Updatesphone_phone
17. Case 1: Call type was recently flagged
Real-time
Call Event Call Time
Caller ID
Callee ID
If caller was
recently
flagged as
“bad”
If Caller is
classified as
“bad”Classifier
Query
Real-time
Collect Caller’s
Graph Features
Update
18. Case 2: Call needs to be classified
Real-time
Call Event Call Time
Caller ID
Callee ID
If caller was
recently
flagged as
“bad”
If Caller is
classified as
“bad”Classifier
Query
Real-time
Collect Caller’s
Graph Features
Update
Input: list of
calls with
phone pairs
and call time
(batch)
Output: 1. Call fraud type; 2. Scoring and feature vector
of fraud calls for supporting evidence Explainable AI
19. China Mobile Machine Learning Workflow
1. Data labels from police reports and online third party sources
2. A total of 118 graph features analyzed to build fraud detection model
3. All 118 graph features collected by one GSQL query
4. Training data’s features collected in GSQL in batch processing and stored
as CSV file for future model training
5. TigerGraph performs fraud scoring with multiple Machine Learning models
in real-time
6. Machine Learning models are trained offline and model parameters stored
as configuration files for GSQL to use for real-time scoring
(Future: Training ML models in Spark)
20. Machine Learning with TigerGraph
Real-time Scoring with Multiple ML models in GSQL
Efficient EasyFast
Real-time
response for both
feature collection
and scoring
Aggregation during
traversal - multiple
features in one
Collect complex
features without
multiple RDBMS
joins
21. China Mobile Anti-Fraud Results
from TigerGraph Machine Learning Solutions
• 3.2 million fraud notifications
in Shandong Province
(Dec 2016 – July 2019)
• Save potential loss
• ~39.86 million RMB
(~ 6 million US dollars)
23. Why TigerGraph + Spark For Machine Learning?
Parallel processing,
distributed systems
in training, ETL &
feature collections
Capture business
moments with real-
time response with
explainable AI
23
Enrich machine
learning with
complex graph
features
AT SCALE ! AT SCALE ! AT SCALE !
24. Spark and TigerGraph Data Pipeline
Static
Data
Sources
TigerGraph
JDBC
Driver
Streaming
Data
Sources
25. JDBC Driver (v1.2)
● Type 4 driver
● Support Read and Write bi-directional data flow to TigerGraph
● Read: Converts ResultSet to DataFrame
● Write: Load DataFrame and files to vertex/edge in TigerGraph
● Supports REST endpoints of built-in, compiled and interpreted GSQL queries from
TigerGraph
● Open Source:
● https://github.com/tigergraph/ecosys/tree/master/etl/tg-jdbc-driver
27. Examples of Graph Features for Machine Learning
27
Good Phone
Features
Bad Phone
Features
(1) Short term call
duration
(2) Empty stable group
(3) No call back phone
(4) Many rejected calls
(5) Average distance > 3
Empty stable group
Many rejected
calls
Average
distance > 3
(1) High call back
phone
(2) Stable group
(3) Long term phone
(4) Many in-group
connections
(5) 3-step friend relation
Stable
group
Many in-
group
connections
Good Phone
Features
3-step friend
relation
///
Good phone Bad phone
X
X
X
28. Graph Features: Stable Group & InGroup
Connection
• Stable Group: phones in the target group that have regular calls
(stable connection) with source phone
• Stable InGroup Connections: phones in the target group that have
regular calls (stable connection) among themselves
Stable Connection defined as
● Has both call and callback
● Num of calls is larger than a given limit
● Total duration is larger than a given limit
29. Resources
• TigerGraph Cloud Machine Learning Starter Kit
a. Register at tgcloud.us
• JDBC Driver (Open Source)
a. https://github.com/tigergraph/ecosys/tree/master/etl/tg-jdbc-
driver
• Contact me at emma.liu@tigergraph.com
29
30. More … TigerGraph & Neural Network
30
Training data: https://www.coursera.org/learn/machine-learning
Watch Graph Guru Episode 19
https://info.tigergraph.com/graph-gurus-19
Contact Me:
emma.liu@tigergraph.com
31. Graph analysis is possibly the single most effective
competitive differentiator for organizations pursuing data-driven
operations and decisions after the design of data capture.”
Realtime deep link graph analytics at scale is the
differentiator to your machine learning pipeline!
32. DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT
34. Stable Group Pseudocode
Step 1: start from a given phone vertex,
find its 1-step neighbors
Step 2: check if a target has both stable
outgoing (phone_phone) and stable
incoming edges (phone_phone_reversed)
source
target4
target3
- num of call
- total duration
- call date list
- num of rejection
target2
target1
phone_phone
phone_phone
phone_phone_reversed
Stable Connection defined as
● Has both call and callback
● Num of calls is larger than a given limit
● Total duration is larger than a given limit
source
35. Stable InGroup Connections Pseudocode
Step 1: starting from a given phone vertex,
find its 1-step neighbors (target group)
Step 2: for each vertex in the target group,
find its 1-step neighbors and check for
stable connections
Step 3: check the stable target for each
vertex in the target group
source
target4
target3
- num of call
- total duration
- call date list
- num of rejection
target2
target1phone_phone
phone_phone
phone_phone_reversed
source
Stable Connection defined as
● Has both call and callback
● Num of calls is larger than a given limit
● Total duration is larger than a given limit