This presentation delivers basics of graph concept and graph databases to audience. It clearly explains how graph databases are used with sample use cases from industry and how it can be used for police departments. Questions like "When to use a graph DB?" and "Should I solve a problem with Graph DB?" are answered.
Levelling - Rise and fall - Height of instrument method
How Graph Databases used in Police Department?
1. How Graph Databases
used in Police Dept.
01/12/2018 - v1.1
Samet Kılıçtaş / Solutions Architect @ Huawei R&D
IEEE ITU COMWEEK’18 @Istanbul
2. CONTENT
Why it is important?
Why relations are important?
Meet Graph Concept
01
02
What is Graph Databases?
How it works?
Graph Databases
03
Janusgraph DB
Gremlin and Data Modelling
Four Steps To Analyze
04
How graph databases are formed
to feed crime analyze?
Police Dept. Use Cases
4. Why it is important?
Graph Databases
Its focus on relationships
We are gathering more data than ever before, now data
relations are more important as we all want to reveal
hidden patterns within data
Intuitive Query Language
Such language makes writing queries even more simpler for
data scientists and agility that saves a lot of time which is
valuable to your company.
Connected Query Performance
Performance remains constant as graph size increases.
Performance slowdown is linear or better as density & degree
increase
Query Resp Time = f(graph density, graph size, query degree)
Flexible
Graph databases are too flexible that can adapt to new
model changes without any interruption. With its adaptive
query methodologies fast results will be helping your
latency problems.
5. Why Relationships Matter?
RDMBS vs. Graph DB
Relational Databases can get
complex and too inflexible
when presenting relationships
6. The Problem?
Connects vertices
• All JOINs are executed every time you query (traverse) the
relationship
• Executing a JOIN means to search for a key in another table
• With Indices executing a JOIN means to lookup a key
• more entries => more lookups => slower JOINs
7. Gremlin vs. SQL
Open Source Graph Database
VS.
Imagine a situation where you need to JOIN
multiple big tables with many dimensions….!!!
Ex. Problem: Find me customers who
bought products that was bought
together with product 3 by other
customers who also bought product 5
at least 1 time.
8. Quick look what is a Graph?
Let’s reveal what is graph
SourceSource
Graph = Vertex + Edge
9. What is a Vertex?
Presents a thing
Things are Vertices or Nodes
Vertex have Properties
Vertex have label to express itself
Properties
{Keys:Values}
Label
11. v: Person
id: 1111
first_name: Mike
last_name: Foo
v: Person
id: 1234
first_name: Jack
last_name: Foo
v: Person
id: 2222
first_name: Dane
last_name: Foo
What is an Edge?
Connects vertices
Vertices are connected with edges
or relationships
Edges have direction and a
type and can have properties
e: likes
since: 2012
e: child_of
since: 2000
e: child_of
since: 1996
12. v: Person
id: 1111
first_name: Mike
last_name: Foo
v: Person
id: 2222
first_name: Dane
last_name: Fooe: likes
since: 2012
Sometimes there are DIRECT relationships, NOT HARD to do with Relational DBs
13. Sometimes there are INDIRECT relationships, HARDER to do TRAVERSE with
Relational DBs, but with some effort it can be done
Person A
Car B
Tool A
Company A
Tool B
Person A
14. Can you find me number of people
who involved in two different crime
case last month and having two
children and bought 4K TV in last
month?
No Way! But you can do this with
Graph DB, if you have all data in
there with has paths between them!
17. Evaluation of Web Search
Best technology applied
Pre-1999
WWW Indexing
1999-2012
Google Invents
Page Rank
2012-?
Google Knowledge Graph
Facebook Graph Search
Discrete Data
Connected Data
(Simple)
Connected Data
(Rich)+ +
20. The NOSQL Spectrum
Spectrum of databases with data complexity
Key-Value
Redis, RIAK, DynamoDB
Column-Family
Cassandra, HBase, BigTable
Document DB
Mongo DB, Couchbase, CouchDB
Relational DB
MySQL, MSSQL, PostgreSQL,
Oracle
Graph DB
JanusGraph, OrientDB, Neo4j,
AllegroDB
Data Complexity
Simple
Minimally
Connected Data
Complex
Focused on Data
Relationship
22. Graph Database Ecosystem is Complex
Graph DB usage increasing fast!
Graph Ecosystem is Complex FrameworksDatabases
RDF Triple Stores Property Graph
Gradoop
23. Graph Query Languages
Open Source Graph Database
SPARQL Cypher Gremlin
GraphQL Others
- W3C Standard for
RDFs
- Based on semantic
Web
- Declarative
- Based on semantic
Web
- Easy to use
- Popular
- Imperative +
Declarative
- Powerful
- Steep learning curve
- Most are extensions
of SQL
- Usually specific to
one system
- Useful for REST
endpoints
- Query language for
APIs
24. Search and Selection
Problem: Get me everyone
who works at X?
Answer: Use RDMBS or
Search server
Related Data
Problem: How do John and
Paula knows each other?
Answer: Use Graph DB
Aggregation
Problem: What are my
average sales for each day
over the past month?
Answer: Use an RDMBS
Pattern Matching
Problem: Who is in my
system has a similar profile to
me?
Answer: Use a search server
or a graph
Do I need to use GraphDB to solve my problem?
There are other possibilities too
Influence
Problem: Who is the most
influential person I am
connected with on “LinkedIn”
Answer: Use a graph DB
25. Graph Database Use Cases
High Level Use Cases
Social Networking Fraud Detection
Recommendation Engine Knowledge Graphs
and more...
Master Data Management Access Management
26. Use Cases in Industry
Some of use cases
Relationship Status Analysis Content Management & Access Control
27. Use Cases in Industry
Some of use cases
Insurance Risk Analysis Network Cell Analysis
28. Use Cases in Industry
Some of use cases
Geo Routing (Public Transport) Bio Informatics
29. Four Steps to Analyze
Janusgraph DB
Gremlin and Data Modelling
30. Any enterprise who wants to bring their datasets and data
sources together to find hidden patterns to reveal hidden
information should define which datasets can be used (e.g. Voice,
SMS, GPS Information, Texts)
1- Choose Datastore and Data Source
Creating a graph object out of massive data is problematic and
depends on high system resources. Data model generated
provides blueprint that creates graph properly consisting of Vertex
and Edges
3- Generate Graph
Data model is an important step where massive data sources are
being analyzed and modelled in order to provide identical
information for graph generation
2- Data Modelling
As processed data turned into a reasonable graph object, now it
is time to find hidden patterns within graph or/else analyze action
to be done by data scientists via visualization options provided by
graph DB or 3rd party tools.
4- Information Analyze / Visualize
Four Steps to Analyze
How an enterprise starts to analyze their data from zero?
31. Data Modelling
Graph Data Model = Whiteboard-Friendly
Problem: Create data model for
Chicago Crime Dataset
Description: Dataset of bad people
doing bad things in the Windy City.
Columns:
unique_key
case_number
date .
block
iucr
primary_type
description
location_description
arrest
domestic
beat
district
ward
community_area
fbi_code
x_coordinate
y_coordinate
year
updated_on
latitude
longitude
location
Person
Crime
Case
Block Crime
Type
District
updated_on
date
description
fbi_code
primary_type
domestic
name
description
name
description
age
name
identity_no
married
involved_in
type_of
occurred
within
arrested
32. How to do analyze on Graph?
Compute and Analyze
33. Traverse Example
How to traverse step by step
Can you tell me the name of
the people that marko knows?
gremlin> marko = g.V().has('name','marko').next() //1
==>v[1]
gremlin> g.V(marko).out('knows') //2
==>v[2]
==>v[4]
gremlin> g.V(marko).out('knows').values('name') //3
==>vadas
==>josh
35. Graph Database WARS!
Graph DB usage increasing fast!
● Neo4J is still dominant
● JanusGraph is getting being recognised
● Amazon Neptune is still in war
37. What is JanusGraph DB?
Open Source Graph Database
Apache Tinkerpop Enabled
A popular graph engine
Property Graph
Consists of Vertex, Edge and Properties
Pluggable Storage
Pluggable Search Backend
Scalable
Designed for scale up scale out realtime
and analytical graph computing
applications
39. Patrol Distribution
Patrol distribution can be arrenged based
on ACI and victim clustering
# of arrested persons
It is easy to draw pattern of arrested
person by block and time, with such
clustering new patterns can be identified
Crime Patterns
Help to understand how crime trend
changes what are crime patterns for
specific location and how zones
Crime Network Analysis
Connection of any suspect or victim with
known, unknown organizations and
network cartels
Police Department Scenarios
Sample use cases
40. Patrol Distribution
Sample use cases
1- Reveal where the most deadly
crimes happens
2- Display them on map
3- Run ML to predict
41. Crime Patterns
Sample use cases
1- Identify crime types
2- Crimes seen together
3- “BURGLARY” ,”ROBBERY”,
“MOTOR VEHICLE THEFT”,
“NARCOTICS” crimes seen
together.
42. Crime Network Analysis
Sample use cases
1- Connected clusters
2- Centrality Algorithms
Betweenness Centrality
Eigenvector Centrality
3- Find man in the middle
43. References
Further readings and support points
● Apache Tinkerpop Provides a feature rich graph computing framework
○ http://tinkerpop.apache.org/
● A nice book written by Kewin Lavrance with examples
○ http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html
● Ted Wilmes on the state of JanusGraph 2018
● HBaseCon2017 Community-Driven Graphs with JanusGraph
● Graph Databases Will Change Your Freakin' Life (Best Intro Into Graph Databases)
● MaxDeMarzi - Addressing The BigData Challenge With A Graph