Neo4j - London User Group Meetup - 28th March, 2018
If your data ingestion requirements have grown beyond importing occasional CSV files then this talk is for you. Neo4j-Databridge from GraphAware is a comprehensive ETL tool specifically built for Neo4j. It has been designed for usability, expressive power and high performance to address the most common isues faced when importing data into Neo4j - multiple data sources and type, very large data sets, bespoke data conversions, non-tabular formats, filtering, merging and de-duplication, as well as bulk imports and incremental updates.
In this talk, we'll take a quick tour of the some of the main features, loading data from Kafka, Redis, JDBC and various other data sources along the way, to understand how Neo4j Databridge solves these problems and how it can help you import your data quickly and easily into Neo4j.
Vince Bickers is a Principal Consultant at GraphAware and the main author of Spring Data Neo4j (v4). He has been writing software and leading software development teams for over 30 years at organisations like Vodafone, Deutsche Bank, HSBC, Network Rail, UBS, VMWare, ConocoPhillips, Aviva and British Gas.
3. You’ve outgrown LOAD CSV
You’re doing continuous ingestion
You want to do “What if” snapshotting
You are doing data aggregation into Neo4j
You have complex and/or time consuming imports
Who is it for?
GraphAware®
4. Easy to use and configure
Fast and scalable
Strategies for merging and managing duplicates
Supports multiple data formats and data providers
Automatic conversion from JDBC -> Graph
Expressions for filtering and data conversions
Supports full and incremental imports
Offline and online endpoints
User-extensible via simple APIs
Can integrate easily into an existing Neo4j deployment
Core features
GraphAware®
6. Overview
GraphAware®
Resource Definitions
Schema Mappings
• A Databridge import Task consists of one or more user-
defined specifications, defined using JSON:
• The Databridge import engine uses built-in Adapters to
perform the import
• No need to pre-process data beforehand
• No need to know Cypher
• No need to write code
• If you can write a JSON file, you’re good to go.
14. Custom Adapters
GraphAware®
• Extend AbstractAdapter
• Compile and deploy to /lib folder
• Declare in Resource Definition
Adapter API allows you to build and deploy your own adapters
{
"name" : "my-custom-resource",
"adapter" : "com.mycompany.MyCustomAdapter",
"resource" : "resources/custom.resource"
}
19. Custom data conversions
GraphAware®
• Write your own data converters in Groovy and drop them into the plugins folder
package plugins
import com.graphaware.neo4j.databridge.plugins.*
map = [
'LEO': 'Low-Earth Orbit',
'MEO': 'Mid-Earth Orbit',
'HEO': 'High-Earth Orbit',
'L1': 'Lissajous 1',
'L2': 'Lissajous 2'
]
def bind() {
{ args ->
map.get(args[0])
} as Converter
}
this
• Then use them in your schema mappings
{
"nodes": [
{
"type": "Location",
"properties": [ { "name": “location", "column": "Alt",
"convert": "plugins.orbit_location" } ],
"identity": [ "location" ]
}
]
}
20. Road map
GraphAware®
More adapters (JMS, Elastic, …)
Developer version
Integration with the Neo4j Graph Platform
Web-based UI (under development)
email: databridge@graphaware.com
ask for an evaluation copy
Want to know more?
21. csv - load NASA satellite data from a CSV file
excel - load UK Charity data from an Excel spreadsheet
jdbc - auto-import a MySql database
itn - load the Isle of Man TT route into Neo4j
gpx - import all UK Aviation waypoints (navaid)
kafka - real-time RSS news feeds into Neo4j
redis - :-(
hawkeye - load 15m graph objects in < 1 minute
Demo time!
GraphAware®