SlideShare a Scribd company logo
1 of 61
Download to read offline
Graph Databases and the
#PanamaPapers
Meetup JUG Hessen
2016-01-26
Stefan Armbruster
(Stefan)-[:WORKS_FOR]->(Neo4j)
stefan@neo4j.com | @darthvader42
github.com/sarmbruster | blog.armbruster-it.de
Stefan Armbruster - Field Engineer @Neo4j
Source Material
Kudos to Michael Hunger @mesiiri
source material taken from
• the ICIJ presentation
• the Reddit AMA
• online publications (SZ, Guardian, TNW et.al.)
• the ICIJ website
• https://panamapapers.icij.org/
• The Power Players
• Key Numbers & Figures
+190 journalists in
more than 65 countries
12 staff members (USA, Costa Rica, Venezuela, Germany, France, Spain)
50% of the team = Data & Research Unit
raw files
metadata
author; sender...
database
search and
discovery
raw text
3 million files
x
10 seconds per
file
=
1 yr / 35 servers
= 1.5 weeks
Investigators used
Nuix’s optical
character recognition
to make millions of
scanned documents
text-searchable. They
used Nuix’s named
entity extraction and
other analytical tools
to identify and
cross-reference the
names of Mossack
Fonseca clients
through millions of
Lucene syntax
queries with
proximity matching!
400
users
Unstructured data extraction
● Nuix professional OCR service
● ICIJ Extract (open source, Java: https://github.com/ICIJ/extract),
leverages Apache Tika, Tesseract OCR and JBIG2-ImageIO.
Structured data extraction
● A bunch of Python
Database
● Apache Solr (open source, Java)
● Redis (open source, C)
● Neo4j (open source, Java)
App
● Blacklight (open source, Rails)
● Linkurious (closed source, JS)
Stack
Disconnected Documents
Context is King name: “John”
last: „Miller“
role: „Negotiator“
name: "Maria"
last: "Osara"name: “Some Media Ltd”
value: “$70M”
PERSON
PERSON$
@
PERSON
PERSON
name: ”Jose"
last: “Pereia“
position: “Governor“
name: “Alice”
last: „Smith“
role: „Advisor“
Context is King
DETAILS
CONTAINS SON
SENT
OW
NS
SUPPORTS
CREATED
MENTIONS
name: “John”
last: „Miller“
role: „Negotiator“
name: "Maria"
last: "Osara"
since:
Jan 10, 2011
name: “Some Media Ltd”
value: “$70M”
PERSON
PERSON$
@
WROTE
PERSON
PERSON
name: ”Jose"
last: “Pereia“
position: “Governor“
name: “Alice”
last: „Smith“
role: „Advisor“
NODE
REL.
key: “value”
properties
Property Graph Model
Nodes
• The entities in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and direction
• Can have name-value properties
RELATIONSHIP
REL.
NODE NODE
key: “value”
properties
key: “value”
properties
key: “value”
properties$
Value from Data Relationships
Common Use Cases
Internal Applications
Master Data Management
Network and
IT Operations
Fraud Detection
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and
Access Management
http://neo4j.com/use-cases
Your friend Neo4j
An open-source graph database
• Manage and store your connected
data as a graph
• Query relationships
easily and quickly
• Evolve model and applications
to support new requirements and
insights
• Built to solve relational pains
Your friend Neo4j
An open-source graph database
• Built for Connected Data
• Easy to use
• Optional Schema
• Highly Scalable Performance
• Transactional ACID-Database
Whiteboard to Graph
Neo4j: All about Patterns
(:Person { name:"Dan"} ) -[:KNOWS]-> (:Person {name:"Ann"})
KNOWS
Dan Ann
NODE NODE
LABEL PROPERTY
http://neo4j.com/developer/cypher
LABEL PROPERTY
Cypher: Find Patterns
MATCH (:Person { name:"Dan"} ) -[:KNOWS]-> (who:Person) RETURN who
KNOWS
Dan ???
LABEL
NODE NODE
LABEL PROPERTY ALIAS ALIAS
http://neo4j.com/developer/cypher
Cypher: Clauses
CREATE pattern
MERGE pattern
SET
DELETE
REMOVE
Cypher: Clauses
MATCH pattern
WHERE predicate
ORDER BY expression
SKIP ... LIMIT ...
RETURN expression AS alias ...
Cypher: Clauses
WITH expression AS alias, ...
UNWIND list AS item
LOAD CSV FROM „url“ AS row
Getting Data into Neo4j
Cypher-Based “LOAD CSV”
• Transactional (ACID) writes
• Initial and incremental loads of up to
10 million nodes and relationships
,
,,
LOAD CSV WITH HEADERS FROM "url" AS row
MERGE (:Person {name:row.name,
age:toInt(row.age)});
Getting Data into Neo4j
Load JSON with Cypher
• Load JSON via procedure
• Deconstruct the document
• Into a non-duplicated graph model
{}
{}{}
CALL apoc.load.json("url") yield value as doc
UNWIND doc.items as item
MERGE (:Contract {title:item.title,
amount:toFloat(item.amount)});
Getting Data into Neo4j
CSV Bulk Loader neo4j-import
• For initial database population
• For loads with 10B+ records
• Up to 1M records per second
,,,
,,,,,,
bin/neo4j-import –-into people.db
--nodes:Person people.csv
--nodes:Company companies.csv
--relationship:STAKEHOLDER stakeholders.csv
The Steps Involved in the Document Analysis
1. Acquire documents
2. Classify documents
• Scan / OCR
• Extract document metadata
3. Whiteboard domain and questions, determine
• entities and their relationships
• potential entity and relationship properties
• sources for those entities and their properties
The Steps Involved in the Document Analysis
4. Develop analyzers, rules, parsers and named entity recognition
5. Parse and store metadata, document and entity relationships
• Parse by author, named entities, dates, sources and classifications
6. Infer entity relationships
7. Compute similarities, transitive cover and triangles
8. Analyze data using graph queries and visualizations
We need a Data Model
Meta Data Entities
• Document, Email, Contract,
DB-Record
• Meta: Author, Date, Source,
Keywords
• Conversation: Sender, Receiver,
Topic
• Money Flows
Actual Entities
• Person
• Representative (Officer)
• Address
• Client
• Company
• Account
Either based on our use cases & questions
On the entities present in our meta-data and data.
Data Model – Relationships
Meta-Data
• sent, received, cc‘ed
• mentioned, topic-of
• created, signed
• attached
• roles
• family relationships
Activities
• open account
• manage
• has shares
• registered address
• money flow
The ICIJ Data Model
The ICIJ Data Model
• Simplistic Datamodel with 4 Entities and 5 Relationships
• We only know the published model
• Missing
• Documents, Metadata
• Family Relationships
• Connections to Public Record Databases
• Contains Duplicates
• Relationship information stored on entities
• Could use richer labeling
Example Dataset - Azerbaijan’s President Ilham Aliyev
• was already previously investigated
• whole family involved
• different shell companies & involvements
http://neo4j.com/graphgist/ec65c2fa-9d83-4894-bc1e-98c475c7b57a
Demo Time
Based On: http://neo4j.com/blog/analyzing-panama-papers-neo4j/
Visual Graph Search
For Non-Developers
Background
Business problem Solution & Benefits
© All Rights Reserved 2014 | Neo Technology, Inc.
•JS library based on sigma.js
•Integrates with Neo4j using
Cypher
Linkurious.js
https://github.com/Linkurious/linkurious.js/
Background
Business problem Solution & Benefits
© All Rights Reserved 2014 | Neo Technology, Inc.
•JS library based on d3.js
•Uses Graph Metadata
to offer visual search
•Categories to filter
Instances
•Component based
extensions
•Zero Config with
Popoto.js
http://www.popotojs.com/
Background
Business problem Solution & Benefits
© All Rights Reserved 2014 | Neo Technology, Inc.
Visual Search Bar
•Based on visualsearch.js
•Uses graph metadata for parametrization
•Limit suggestions by selected items
maxdemarzi.com/2013/07/03/the-last-mile/
Background
Business problem Solution & Benefits
© All Rights Reserved 2014 | Neo Technology, Inc.
•Natural Language to Cypher
•Ruby TreeTop Gem for NLP
•Convert phrases to Cypher Fragments
Facebook Graph Search
maxdemarzi.com/2013/01/28/facebook-graph-search-with-cypher-and-neo4j/
Users Love Neo4j
“We found Neo4j to be
literally thousands of times
faster than our prior MySQL
solution, with queries that
require 10 to 100 times less
code. Today, Neo4j provides
eBay with functionality that
was previously impossible.”
Volker Pacher
Senior Developer
Performance
"The Neo4j graph database gives us drastically
improved performance and a simple language to query
our connected data”
- Sebastian Verheugher, Telenor
Scale and Availability
"As the current market leader in graph databases, and
with enterprise features for scalability and availability,
Neo4j is the right choice to meet our demands.”
- Marcos Wada, Walmart
Discrete Data
Minimally
connected data
Graph Databases are designed for data relationships
Summary - Use the Right Database for the Job
Other NoSQL Relational DBMS Graph DBMS
Connected Data
Focused on
Data Relationships
Development Benefits
Easy model maintenance
Easy query
Deployment Benefits
Ultra high performance
Minimal resource usage
Real-Time Query Performance
Graph Versus Relational and Other NoSQL Databases
Connectedness and Size of Data Set
ResponseTime
0 to 2 hops
0 to 3 degrees
Thousands of connections
Tens to hundreds of hops
Thousands of degrees
Billions of connections
Relational and
Other NoSQL
Databases
Neo4j
Neo4j is
1000x faster
“Minutes to
milliseconds”
ICIJ editor Mar Cabra presenting at GraphConnect
Mar Cabra is the Editor of the Data and
Research Unit at the International
Consortium of Investigative Journalists
(ICIJ), the organization responsible for
breaking the Panama Papers story.
Mar has over 11 years of experience
working in data journalism, including the
BBC, CNN and the Miami Herald.
At GraphConnect, Mar will be presenting
on “How the ICIJ Used Neo4j to Unravel
the Panama Papers.”
neo4j.com/blog/top-10-graphconnect-europe-speakers/
More Insight
• Neo4j Blog
• http://neo4j.com/blog/panama-papers/
• http://neo4j.com/blog/analyzing-panama-papers-neo4j/
• ICIJ
• https://panamapapers.icij.org/
• https://panamapapers.icij.org/the_power_players/
• https://panamapapers.icij.org/graphs/
• SZ
• http://panamapapers.sueddeutsche.de/en/
• Guardian
• http://www.theguardian.com/news/series/panama-papers
Users Love Neo4j – Will you too?
Get started with Neo4j today –
Discover Value in Your Relationships
neo4j.com/developer
Thanks! Questions?
Slide Bucket
Why should I care?
Because Relationships
Matter
What is it with Relationships?
•World is full of connected people, events, things
•There is “Value in Relationships” !
•What about Data Relationships?
•How do you store your object model?
•How do you explain
JOIN tables to your boss?
Neo4j – allows you to connect the dots
• Was built to efficiently
• store,
• query and
• manage highly connected data
•Transactional, ACID
•Real-time OLTP
•Open source
•Highly scalable already on few machines
Relational DBs Can’t Handle Data Relationships Well
• Cannot model or store data and relationships
without complexity
• Performance degrades with number and levels
of relationships, and database size
• Query complexity grows with need for JOINs
• Adding new types of data and relationships
requires schema redesign, increasing time to
market
… making traditional databases inappropriate
when data relationships are valuable in real-time
Slow development
Poor performance
Low scalability
Hard to maintain
NoSQL Databases Don’t Handle Data Relationships
• No data structures to model or store
relationships
• No query constructs to support data
relationships
• Relating data requires “JOIN logic”
in the application
• Additionally, no ACID support for
transactions
… making NoSQL databases inappropriate when
data relationships are valuable in real-time
Largest Ecosystem of Graph Enthusiasts
• 1,000,000+ downloads
• 27,000+ education registrants
• 25,000+ Meetup members in 29 countries
• 100+ technology and service partners
• 170+ enterprise subscription customers
including 50+ Global 2000 companies
Value from Data Relationships
Common Use Cases
Internal Applications
Master Data Management
Network and
IT Operations
Fraud Detection
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and
Access Management
Graph Database Fundamentals
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
WHERE boss.name = “John Doe”
RETURN sub.name AS Subordinate,
count(report) AS Total
Express Complex Relationship Queries Easily
Find all reports and how many
people they manage,
up to 3 levels down
Cypher Query
SQL Query

More Related Content

What's hot

Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsStéphane Fréchette
 
Graph Analysis over JSON, Larus
Graph Analysis over JSON, LarusGraph Analysis over JSON, Larus
Graph Analysis over JSON, LarusNeo4j
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
Introducing Neo4j graph database
Introducing Neo4j graph databaseIntroducing Neo4j graph database
Introducing Neo4j graph databaseAmirhossein Saberi
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereEugene Hanikblum
 
Hadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for BioinformaticsHadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for Bioinformaticsosintegrators
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at AirbnbNeo4j
 
Einstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data ScienceEinstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data ScienceNeo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j FundamentalsMax De Marzi
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise ArchitectsNeo4j
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jTobias Lindaaker
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to GraphsNeo4j
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchSearch Technologies
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to GraphsNeo4j
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to GraphsNeo4j
 

What's hot (20)

Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server Professionals
 
Graph Analysis over JSON, Larus
Graph Analysis over JSON, LarusGraph Analysis over JSON, Larus
Graph Analysis over JSON, Larus
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Introducing Neo4j graph database
Introducing Neo4j graph databaseIntroducing Neo4j graph database
Introducing Neo4j graph database
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and Where
 
Hadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for BioinformaticsHadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for Bioinformatics
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Einstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data ScienceEinstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data Science
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j Fundamentals
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to Graphs
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for Search
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to Graphs
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to Graphs
 

Viewers also liked

It's 2017, and I still want to sell you a graph database
It's 2017, and I still want to sell you a graph databaseIt's 2017, and I still want to sell you a graph database
It's 2017, and I still want to sell you a graph databaseSwanand Pagnis
 
Neo4j on Azure Step by Step
Neo4j on Azure Step by StepNeo4j on Azure Step by Step
Neo4j on Azure Step by StepNeo4j
 
Azure ml and dynamics 365
Azure ml and dynamics 365Azure ml and dynamics 365
Azure ml and dynamics 365Jivtesh Singh
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarBig Data Spain
 
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesGraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesNeo4j
 
Facebook
FacebookFacebook
Facebooksonycse
 
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesDatabricks
 

Viewers also liked (7)

It's 2017, and I still want to sell you a graph database
It's 2017, and I still want to sell you a graph databaseIt's 2017, and I still want to sell you a graph database
It's 2017, and I still want to sell you a graph database
 
Neo4j on Azure Step by Step
Neo4j on Azure Step by StepNeo4j on Azure Step by Step
Neo4j on Azure Step by Step
 
Azure ml and dynamics 365
Azure ml and dynamics 365Azure ml and dynamics 365
Azure ml and dynamics 365
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
 
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesGraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
 
Facebook
FacebookFacebook
Facebook
 
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
 

Similar to Graph databases and the #panamapapers

Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2Neo4j
 
Neo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to GraphsNeo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to GraphsNeo4j
 
Intro to Neo4j Webinar
Intro to Neo4j WebinarIntro to Neo4j Webinar
Intro to Neo4j WebinarNeo4j
 
GraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right TechnologyGraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right TechnologyNeo4j
 
Enterprise ready: a look at Neo4j in production
Enterprise ready: a look at Neo4j in productionEnterprise ready: a look at Neo4j in production
Enterprise ready: a look at Neo4j in productionNeo4j
 
The Connected Data Imperative: The Shifting Enterprise Data Story
The Connected Data Imperative: The Shifting Enterprise Data StoryThe Connected Data Imperative: The Shifting Enterprise Data Story
The Connected Data Imperative: The Shifting Enterprise Data StoryNeo4j
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databasesthai
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training IntroductionMax De Marzi
 
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
Polyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jPolyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jCorie Pollock
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsCarly Strasser
 
DubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataDubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataScott Sosna
 
GraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4jGraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4jNeo4j
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Rahul Jain
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisCrowdFlower
 
GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?Neo4j
 

Similar to Graph databases and the #panamapapers (20)

Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
Neo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to GraphsNeo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to Graphs
 
Intro to Neo4j Webinar
Intro to Neo4j WebinarIntro to Neo4j Webinar
Intro to Neo4j Webinar
 
GraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right TechnologyGraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right Technology
 
Enterprise ready: a look at Neo4j in production
Enterprise ready: a look at Neo4j in productionEnterprise ready: a look at Neo4j in production
Enterprise ready: a look at Neo4j in production
 
The Connected Data Imperative: The Shifting Enterprise Data Story
The Connected Data Imperative: The Shifting Enterprise Data StoryThe Connected Data Imperative: The Shifting Enterprise Data Story
The Connected Data Imperative: The Shifting Enterprise Data Story
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
 
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperative
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
Polyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jPolyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4j
 
GraphDB
GraphDBGraphDB
GraphDB
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheets
 
DubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataDubJug: Neo4J and Open Data
DubJug: Neo4J and Open Data
 
GraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4jGraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4j
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?
 

Recently uploaded

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 

Recently uploaded (20)

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Graph databases and the #panamapapers

  • 1. Graph Databases and the #PanamaPapers Meetup JUG Hessen 2016-01-26 Stefan Armbruster
  • 2. (Stefan)-[:WORKS_FOR]->(Neo4j) stefan@neo4j.com | @darthvader42 github.com/sarmbruster | blog.armbruster-it.de Stefan Armbruster - Field Engineer @Neo4j
  • 3.
  • 4. Source Material Kudos to Michael Hunger @mesiiri source material taken from • the ICIJ presentation • the Reddit AMA • online publications (SZ, Guardian, TNW et.al.) • the ICIJ website • https://panamapapers.icij.org/ • The Power Players • Key Numbers & Figures
  • 5.
  • 6.
  • 7. +190 journalists in more than 65 countries 12 staff members (USA, Costa Rica, Venezuela, Germany, France, Spain) 50% of the team = Data & Research Unit
  • 9. 3 million files x 10 seconds per file = 1 yr / 35 servers = 1.5 weeks Investigators used Nuix’s optical character recognition to make millions of scanned documents text-searchable. They used Nuix’s named entity extraction and other analytical tools to identify and cross-reference the names of Mossack Fonseca clients through millions of
  • 11. Unstructured data extraction ● Nuix professional OCR service ● ICIJ Extract (open source, Java: https://github.com/ICIJ/extract), leverages Apache Tika, Tesseract OCR and JBIG2-ImageIO. Structured data extraction ● A bunch of Python Database ● Apache Solr (open source, Java) ● Redis (open source, C) ● Neo4j (open source, Java) App ● Blacklight (open source, Rails) ● Linkurious (closed source, JS) Stack
  • 12.
  • 14. Context is King name: “John” last: „Miller“ role: „Negotiator“ name: "Maria" last: "Osara"name: “Some Media Ltd” value: “$70M” PERSON PERSON$ @ PERSON PERSON name: ”Jose" last: “Pereia“ position: “Governor“ name: “Alice” last: „Smith“ role: „Advisor“
  • 15. Context is King DETAILS CONTAINS SON SENT OW NS SUPPORTS CREATED MENTIONS name: “John” last: „Miller“ role: „Negotiator“ name: "Maria" last: "Osara" since: Jan 10, 2011 name: “Some Media Ltd” value: “$70M” PERSON PERSON$ @ WROTE PERSON PERSON name: ”Jose" last: “Pereia“ position: “Governor“ name: “Alice” last: „Smith“ role: „Advisor“
  • 16.
  • 17. NODE REL. key: “value” properties Property Graph Model Nodes • The entities in the graph • Can have name-value properties • Can be labeled Relationships • Relate nodes by type and direction • Can have name-value properties RELATIONSHIP REL. NODE NODE key: “value” properties key: “value” properties key: “value” properties$
  • 18. Value from Data Relationships Common Use Cases Internal Applications Master Data Management Network and IT Operations Fraud Detection Customer-Facing Applications Real-Time Recommendations Graph-Based Search Identity and Access Management http://neo4j.com/use-cases
  • 19. Your friend Neo4j An open-source graph database • Manage and store your connected data as a graph • Query relationships easily and quickly • Evolve model and applications to support new requirements and insights • Built to solve relational pains
  • 20. Your friend Neo4j An open-source graph database • Built for Connected Data • Easy to use • Optional Schema • Highly Scalable Performance • Transactional ACID-Database
  • 22. Neo4j: All about Patterns (:Person { name:"Dan"} ) -[:KNOWS]-> (:Person {name:"Ann"}) KNOWS Dan Ann NODE NODE LABEL PROPERTY http://neo4j.com/developer/cypher LABEL PROPERTY
  • 23. Cypher: Find Patterns MATCH (:Person { name:"Dan"} ) -[:KNOWS]-> (who:Person) RETURN who KNOWS Dan ??? LABEL NODE NODE LABEL PROPERTY ALIAS ALIAS http://neo4j.com/developer/cypher
  • 24. Cypher: Clauses CREATE pattern MERGE pattern SET DELETE REMOVE
  • 25. Cypher: Clauses MATCH pattern WHERE predicate ORDER BY expression SKIP ... LIMIT ... RETURN expression AS alias ...
  • 26. Cypher: Clauses WITH expression AS alias, ... UNWIND list AS item LOAD CSV FROM „url“ AS row
  • 27. Getting Data into Neo4j Cypher-Based “LOAD CSV” • Transactional (ACID) writes • Initial and incremental loads of up to 10 million nodes and relationships , ,, LOAD CSV WITH HEADERS FROM "url" AS row MERGE (:Person {name:row.name, age:toInt(row.age)});
  • 28. Getting Data into Neo4j Load JSON with Cypher • Load JSON via procedure • Deconstruct the document • Into a non-duplicated graph model {} {}{} CALL apoc.load.json("url") yield value as doc UNWIND doc.items as item MERGE (:Contract {title:item.title, amount:toFloat(item.amount)});
  • 29. Getting Data into Neo4j CSV Bulk Loader neo4j-import • For initial database population • For loads with 10B+ records • Up to 1M records per second ,,, ,,,,,, bin/neo4j-import –-into people.db --nodes:Person people.csv --nodes:Company companies.csv --relationship:STAKEHOLDER stakeholders.csv
  • 30.
  • 31. The Steps Involved in the Document Analysis 1. Acquire documents 2. Classify documents • Scan / OCR • Extract document metadata 3. Whiteboard domain and questions, determine • entities and their relationships • potential entity and relationship properties • sources for those entities and their properties
  • 32. The Steps Involved in the Document Analysis 4. Develop analyzers, rules, parsers and named entity recognition 5. Parse and store metadata, document and entity relationships • Parse by author, named entities, dates, sources and classifications 6. Infer entity relationships 7. Compute similarities, transitive cover and triangles 8. Analyze data using graph queries and visualizations
  • 33. We need a Data Model Meta Data Entities • Document, Email, Contract, DB-Record • Meta: Author, Date, Source, Keywords • Conversation: Sender, Receiver, Topic • Money Flows Actual Entities • Person • Representative (Officer) • Address • Client • Company • Account Either based on our use cases & questions On the entities present in our meta-data and data.
  • 34. Data Model – Relationships Meta-Data • sent, received, cc‘ed • mentioned, topic-of • created, signed • attached • roles • family relationships Activities • open account • manage • has shares • registered address • money flow
  • 35. The ICIJ Data Model
  • 36. The ICIJ Data Model • Simplistic Datamodel with 4 Entities and 5 Relationships • We only know the published model • Missing • Documents, Metadata • Family Relationships • Connections to Public Record Databases • Contains Duplicates • Relationship information stored on entities • Could use richer labeling
  • 37. Example Dataset - Azerbaijan’s President Ilham Aliyev • was already previously investigated • whole family involved • different shell companies & involvements http://neo4j.com/graphgist/ec65c2fa-9d83-4894-bc1e-98c475c7b57a
  • 38. Demo Time Based On: http://neo4j.com/blog/analyzing-panama-papers-neo4j/
  • 39. Visual Graph Search For Non-Developers
  • 40. Background Business problem Solution & Benefits © All Rights Reserved 2014 | Neo Technology, Inc. •JS library based on sigma.js •Integrates with Neo4j using Cypher Linkurious.js https://github.com/Linkurious/linkurious.js/
  • 41. Background Business problem Solution & Benefits © All Rights Reserved 2014 | Neo Technology, Inc. •JS library based on d3.js •Uses Graph Metadata to offer visual search •Categories to filter Instances •Component based extensions •Zero Config with Popoto.js http://www.popotojs.com/
  • 42. Background Business problem Solution & Benefits © All Rights Reserved 2014 | Neo Technology, Inc. Visual Search Bar •Based on visualsearch.js •Uses graph metadata for parametrization •Limit suggestions by selected items maxdemarzi.com/2013/07/03/the-last-mile/
  • 43. Background Business problem Solution & Benefits © All Rights Reserved 2014 | Neo Technology, Inc. •Natural Language to Cypher •Ruby TreeTop Gem for NLP •Convert phrases to Cypher Fragments Facebook Graph Search maxdemarzi.com/2013/01/28/facebook-graph-search-with-cypher-and-neo4j/
  • 44. Users Love Neo4j “We found Neo4j to be literally thousands of times faster than our prior MySQL solution, with queries that require 10 to 100 times less code. Today, Neo4j provides eBay with functionality that was previously impossible.” Volker Pacher Senior Developer Performance "The Neo4j graph database gives us drastically improved performance and a simple language to query our connected data” - Sebastian Verheugher, Telenor Scale and Availability "As the current market leader in graph databases, and with enterprise features for scalability and availability, Neo4j is the right choice to meet our demands.” - Marcos Wada, Walmart
  • 45. Discrete Data Minimally connected data Graph Databases are designed for data relationships Summary - Use the Right Database for the Job Other NoSQL Relational DBMS Graph DBMS Connected Data Focused on Data Relationships Development Benefits Easy model maintenance Easy query Deployment Benefits Ultra high performance Minimal resource usage
  • 46. Real-Time Query Performance Graph Versus Relational and Other NoSQL Databases Connectedness and Size of Data Set ResponseTime 0 to 2 hops 0 to 3 degrees Thousands of connections Tens to hundreds of hops Thousands of degrees Billions of connections Relational and Other NoSQL Databases Neo4j Neo4j is 1000x faster “Minutes to milliseconds”
  • 47. ICIJ editor Mar Cabra presenting at GraphConnect Mar Cabra is the Editor of the Data and Research Unit at the International Consortium of Investigative Journalists (ICIJ), the organization responsible for breaking the Panama Papers story. Mar has over 11 years of experience working in data journalism, including the BBC, CNN and the Miami Herald. At GraphConnect, Mar will be presenting on “How the ICIJ Used Neo4j to Unravel the Panama Papers.” neo4j.com/blog/top-10-graphconnect-europe-speakers/
  • 48. More Insight • Neo4j Blog • http://neo4j.com/blog/panama-papers/ • http://neo4j.com/blog/analyzing-panama-papers-neo4j/ • ICIJ • https://panamapapers.icij.org/ • https://panamapapers.icij.org/the_power_players/ • https://panamapapers.icij.org/graphs/ • SZ • http://panamapapers.sueddeutsche.de/en/ • Guardian • http://www.theguardian.com/news/series/panama-papers
  • 49. Users Love Neo4j – Will you too?
  • 50. Get started with Neo4j today – Discover Value in Your Relationships neo4j.com/developer
  • 53. Why should I care? Because Relationships Matter
  • 54. What is it with Relationships? •World is full of connected people, events, things •There is “Value in Relationships” ! •What about Data Relationships? •How do you store your object model? •How do you explain JOIN tables to your boss?
  • 55. Neo4j – allows you to connect the dots • Was built to efficiently • store, • query and • manage highly connected data •Transactional, ACID •Real-time OLTP •Open source •Highly scalable already on few machines
  • 56. Relational DBs Can’t Handle Data Relationships Well • Cannot model or store data and relationships without complexity • Performance degrades with number and levels of relationships, and database size • Query complexity grows with need for JOINs • Adding new types of data and relationships requires schema redesign, increasing time to market … making traditional databases inappropriate when data relationships are valuable in real-time Slow development Poor performance Low scalability Hard to maintain
  • 57. NoSQL Databases Don’t Handle Data Relationships • No data structures to model or store relationships • No query constructs to support data relationships • Relating data requires “JOIN logic” in the application • Additionally, no ACID support for transactions … making NoSQL databases inappropriate when data relationships are valuable in real-time
  • 58. Largest Ecosystem of Graph Enthusiasts • 1,000,000+ downloads • 27,000+ education registrants • 25,000+ Meetup members in 29 countries • 100+ technology and service partners • 170+ enterprise subscription customers including 50+ Global 2000 companies
  • 59. Value from Data Relationships Common Use Cases Internal Applications Master Data Management Network and IT Operations Fraud Detection Customer-Facing Applications Real-Time Recommendations Graph-Based Search Identity and Access Management
  • 61. MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, count(report) AS Total Express Complex Relationship Queries Easily Find all reports and how many people they manage, up to 3 levels down Cypher Query SQL Query