The presentation discussed how graphs, streams, and tables work together using a fraud detection use case at a bank. Event data about customers, accounts, and sessions is ingested from various systems into Kafka streams. A Neo4j graph database integrated with Kafka via Neo4j Streams consumes this event stream to build a graph model of entities and their relationships. A GRANDstack application exposes this graph via GraphQL to allow fraud analysts to investigate suspicious patterns and accounts flagged by graph algorithms, and update the graph based on their adjudications.
5. The Trinity:
Streams,
Tables,
Graphs
5
Streams
● Record history
● Sequence of immutable
data records
Tables
● Represent state
● Collection of
key-value pairs
Graphs
● Integrate datasets and query
across them in near real time
● Graph analytics provide
actionable insight
https://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
6. The world is a graph – everything is connected
• people, places, events
• companies, markets
• countries, history, politics
• sciences, art, teaching
• technology, networks, machines,
applications, users
• software, code, dependencies,
architecture, deployments
• criminals, fraudsters and their behavior
7. Use Cases
Internal Applications
Master Data Management
Network and
IT Operations
Fraud Detection
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and
Access Management
13. Graph Database
● Database management system (DBMS)
● Property Graph data model
● Cypher query language
● Graph analytics
● Data visualization
● Developer tool for building applications
What is Neo4j?
neo4j.com/
14. Car
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Property Graph Model Components
Nodes
• Represent the objects in the graph
• Can be labeled
Relationships
• Relate nodes by type and direction
Properties
• Name-value pairs that can go on
nodes and relationships.
LOVES
LOVES
LIVES WITH
OW
NS
Person Person
16. Cypher Query Language
CREATE (:Company { name:“Neo4j”} ) -[:LOCATED_IN]-> (:City { name:“San Mateo”} )
LOCATED_INNeo4j
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
An
San Mateo
17.
18. Proof of concept goal:
● Combine customer, account, and
session data from different
systems into Neo4j
● Find suspicious parties and
accounts
● Identify potential fraud rings
(connected parties) and flag for
analyst follow up
18
Fraud Detection With Neo4j At Pig E. Bank
Customers Accounts Sessions
19. ○ Suspicious:
■ Shared SSNs, phones,
cookies
■ Connected to known
fraudsters
Evidence of Fraud
Cookie SSN Phone
Person Person
Person
27. 27
Fraud Flagger
1. Links innocents to
suspects
2. Suspects: a known
fraudster, or anyone
connected to one
3. Louvain Community
Detection to group all
associated parties into
candidate fraud rings
28. Graph Algorithm Categories in Neo4j
neo4j.com/
graph-algorithms-
book/
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Link
Prediction
Finds optimal paths
or evaluates route
availability and quality
Determines the
importance of distinct
nodes in the network
Detects group
clustering or partition
options
Evaluates how
alike nodes are
Estimates the likelihood
of nodes forming a
future relationship
Similarity
29. Graph Algorithms in Neo4j
• Parallel Breadth First Search &
DFS
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step &
Multi-Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Updated June 2019
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
+35
37. ● Easiest way to deploy a connector
to get data into Neo4j
● Best flexibility to change which
data you pull from Kafka and
what goes into Neo4j without
touching the database
37
Kafka Connect Sink
https://www.confluent.io/hub/neo4j/kafka-connect-neo4j
38. 38
Infrastructure at Pig E. Bank
stream2
stream1
stream3
Party
Interaction
Stream
Online
Banking
Account
Registration
Customer
Service
Existing
Systems
39. NEO4J_kafka_group_id: myconsumer
NEO4J_streams_sink_topic_cypher_cookies: "
MERGE (c:Cookie { cookie_id: event.cookie_id })
ON CREATE SET c += event
MERGE (p:Party { id: toInt(event.party_id) })
MERGE (p)-[:COOKIE]->(c)"
NEO4J_streams_sink_enabled: "true"
NEO4J_streams_procedures_enabled: "true"
39
Configuring Neo4j Streams (docker)
NEO4J_streams_source_enabled: "true"
NEO4J_streams_source_topic_nodes_fraudflags: Party{*}
NEO4J_streams_source_topic_relationships_associations: ASSOCIATED{*}
NEO4J_streams_source_schema_polling_interval: 10000
Take messages from the “cookies” topic,
and write new cookie nodes to the graph,
matched to the right party!
Whenever a change is
made to a Party
or an ASSOCIATED link
created, report
that to a topic.
40. 40
Graphs Back to Tables, with a little help from KSQL
● Neo4j-streams publishes CDC back to Kafka
● Define a stream using KSQL that structures that
JSON
● Simple KSQL query over that stream yields all of
the cases WHERE fraud_followup OR
fraud_confirmed;
53. 53
Investigative GRANDstack App React UI
fetches data
from Neo4j
using
GraphQL
View data on parties or
“fraud flagged” cases
Select an active case to
begin adjudication
analysis Graph visualization
enables fraud analyst to
explore the connected
accounts to verify
fraudulent behavior.
Analyst adjudicates case, updating data
in Neo4j which sends an event to Kafka
fraud stream via neo4j-streams
57. No PII Was Harmed
in the Making of this
Presentation
You may see phone numbers and Social
Security Numbers on screen.
Most of the schema and use case is real,
the data is fake.
67. ● Neo4j-streams: integrate Kafka & Neo4j, deploy as a Neo4j plugin or as a
connect worker:
○ Code: https://github.com/neo4j-contrib/neo4j-streams
○ Kafka Connect Neo4j Sink: https://www.confluent.io/hub/neo4j/kafka-connect-neo4j
● GRANDStack: GraphQL, React, Apollo, and Neo4j for building rich web
applications on graphs https://grandstack.io/
● How to Leverage Neo4j-Streams to build a just-in-time data warehouse
https://www.freecodecamp.org/news/how-to-leverage-neo4j-streams-and-build-a-just-in-time-data-warehouse-64adf290f093/
● Neo4j Graph Algorithms https://neo4j.com/docs/graph-algorithms/current/
67
Resources
82. 82
The Trinity
● Kafka streams: a log of key/value pairs
● Tables: the latest value for each key
● Graphs: interchangeable, with rich
relationships
● Why? Because:
○ Relationships matter, sometimes the
pattern is more important than the
individual data point
○ Table joins are hard to do and not very
performant when you have many to do
○ Many problem domains are more
naturally expressed as a graph
83. 83
Use Case
● Top 10 largest bank is looking
for fraudulent behavior in
access to online accounts.
● Customers (or “Parties”)
access accounts online;
naturally we know their basic
details (phone, SSN, address)
and also their linked Credit
Cards, Accounts. Cookies & IP
addresses track who/how
access is made
84. 84
Investigative App
● Flagger sets suspicious graph algo people to fraud_followup=true
● Find records with fraud_followup=true, fraud_confirmed=false
○ This means we think something’s going on here but needs followup
○ Set a case ID. If you have a case ID but fraud_confirmed=false, then you are either investigating (and
doesn’t need to be in the dashboard) or you’ve adjudicated the case and it doesn’t show.
● Possibilities:
○ Flag is bogus. In this case, leave fraud_confirmed=false, and set case ID
○ Flag is good. Then set fraud_confirmed=true and set case ID
○ Anything else is “not yet worked”
○ fraud_followup = false is an innocent bystander not worth checking.
● App gives:
○ Table of “suspects”
○ App shows parties connected - what detail do we share? How many pieces? Etc.
■ Subplot: cross-reference transactions -- this person has spent more than usual in different places
lately
○ bloom deep link to suspects and where they exist in the graph
○ Some fake notion of “do the investigation”
● Toggle fraud_confirmed=true (because we’ve decided this guy is bogus)