17. Basic Query: Who do people report to?
MATCH
(e:Employee)<-[:REPORTS_TO]-(sub:Employee)
RETURN
e.employeeID AS managerID,
e.firstName AS managerName,
sub.employeeID AS employeeID,
sub.firstName AS employeeName;
28. Step-by-step Creating the Graph
1.Create the core data
2.Create indexes on the data for performance
3.Create the relationships
29. LOADing the Data
//Create customers
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/customers.csv" AS row
CREATE (:Customer {companyName: row.CompanyName, customerID:
row.CustomerID, fax: row.Fax, phone: row.Phone});
//Create products
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/products.csv" AS row
CREATE (:Product {productName: row.ProductName, productID: row.ProductID,
unitPrice: toFloat(row.UnitPrice)});
30. LOADing the Data
// Create suppliers
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/suppliers.csv" AS row
CREATE (:Supplier {companyName: row.CompanyName, supplierID:
row.SupplierID});
// Create employees
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/employees.csv" AS row
CREATE (:Employee {employeeID:row.EmployeeID, firstName: row.FirstName,
lastName: row.LastName, title: row.Title});
31. LOADing the Data
// Create categories
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/categories.csv" AS row
CREATE (:Category {categoryID: row.CategoryID, categoryName:
row.CategoryName, description: row.Description});
// Create orders
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS row
MERGE (order:Order {orderID: row.OrderID}) ON CREATE SET order.shipName =
row.ShipName;
32. Creating the Indexes
CREATE INDEX ON :Product(productID);
CREATE INDEX ON :Product(productName);
CREATE INDEX ON :Category(categoryID);
CREATE INDEX ON :Employee(employeeID);
CREATE INDEX ON :Supplier(supplierID);
CREATE INDEX ON :Customer(customerID);
CREATE INDEX ON :Customer(customerName);
33. Creating the Relationships
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (customer:Customer {customerID: row.CustomerID})
MERGE (customer)-[:PURCHASED]->(order);
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (supplier:Supplier {supplierID: row.SupplierID})
MERGE (supplier)-[:SUPPLIES]->(product);
34. Creating the Relationships
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (category:Category {categoryID: row.CategoryID})
MERGE (product)-[:PART_OF]->(category);
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/employees.csv" AS row
MATCH (employee:Employee {employeeID: row.EmployeeID})
MATCH (manager:Employee {employeeID: row.ReportsTo})
MERGE (employee)-[:REPORTS_TO]->(manager);
35. Creating the Relationships
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (product:Product {productID: row.ProductID})
MERGE (order)-[relation:PRODUCT]->(product)
ON CREATE SET relation.unitPrice = toFloat(row.UnitPrice), relation.quantit
= toFloat(row.Quantity);
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-
contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (employee:Employee {employeeID: row.EmployeeID})
MERGE (employee)-[:SOLD]->(order);
So…I’ll be honest, before joining Neo— I don’t think I ever really cared for or considered databases, their architecture, etc. etc. All I cared about was “does it work” and “how much does it cost.” The average developer just wants a database that is going to be easy to use, reliable, and high performance. Brilliant developers want a sandbox that they can do the impossible in, I know, databases are an inherently unsexy topic. If the database is doing it’s job, you forget it (or at least the end-user) will. That’s really why I love Neo4j— it gets your data management issues out of the way as quickly as possible and allows you to be as creative and bold as you wish.
The goal of this presentation is to illustrate how easy it is model and query your data in a graph. This is a brief overview of what we’ll be discussing today.
First, we’re going to talk about our data, it’s domain, and then think through identifying the relationships inside that data.
Next we’ll discuss methods for loading and transforming that data into a graph and persisting it inside Neo4j.
Finally, we’ll get to the important part—querying that data. I’ll first show you Neo4j’s built-in query browser, however, in production it’s rare that a user will be probing that data via the browser a la data science, instead most of us will have an application sitting atop our database. We’ll discuss how we can query Neo4j via an application.
The Northwind Traders sample database contains the sales data for a fictitious company called Northwind Traders, which imports and exports specialty foods from around the world. It’s the sample dataset that many of us spun up our first SQL server against.
Here was see a database diagram that you’re probably familiar with— we see some arrows drawn between important common items between tables.
That’s cool and all…but
Taking a step back, let’s consider how we think of people, places, things, and actions. We rarely think of ourselves as a single trait. I am not emailAddress: “kevin@neotechnolgy.com”, rather, I am an entire litany of things, properties that encapsulate who or what I am. In Neo4j, if you so choose, you can create nodes and relationships that mirror the way you think about and consider your world.
Here is an employee, they have a name, an id, startDate, birthday, etc, etc. Here is an order, etc. and finally we have the important part, their relationship. We understand this order, based on it’s context. It was sold by this employee.
Here we see rather that we can enrich this diagram by naming a relationship between products and categories, orders and who sold them, who supplied a given product…suddenly we see that a RDMSs diagram can easily become a graph. Which is again…pretty cool, but the power will come from what we can do via these relationships
This is the final result. We have given nodes-types (we call them labels inside Neo4j), we have relationship types. Not shown here, but later will be touched upon, we can also store key-value pairs on both nodes and relationships. We call these key-value pairs “properties.”
normally, we would discuss ETL here, we’ll address it later …but that’s not the most exciting thing to discuss. Let’s jump into the meaty stuff.
That’s cool and all…but
## ryans-MacBook-Pro:~ ryanboyd$ sudo easy_install pip
## ryans-MacBook-Pro:~ ryanboyd$ sudo pip install py2neo
import py2neo
from py2neo import Graph, Path
py2neo.set_auth_token('localhost:7474', '2169deeda8ddcf7b38236ddead96d914')
graph = Graph("http://localhost:7474/db/data/")
cyres = graph.cypher.execute("""MATCH
path = (e:Employee)<-[:REPORTS_TO*]-(sub)
WITH
e, sub,
[person in NODES(path) | person.firstName][1..-1] AS path
RETURN
e.employeeID AS managerID,
e.firstName AS managerName,
sub.employeeID AS employeeID,
sub.firstName AS employeeName,
CASE
WHEN LENGTH(path) = 0
THEN "Direct Report"
ELSE path END AS via
ORDER BY LENGTH(path)""");
for record in cyres:
print record['via']
import py2neo
from py2neo import Graph, Path
from BaseHTTPServer import BaseHTTPRequestHandler
import urlparse
HTTP_PORT = 8080
NEO_PROTOCOL = 'http'
NEO_HOST = 'localhost'
NEO_PORT = 7474
NEO_USER = 'neo4j'
NEO_PASSWORD = 'neo4j2'
class GetHandler(BaseHTTPRequestHandler):
def do_GET(self):
response_parts = [
"<html><head>",
"<style type='text/css'>",
"td,li,h1 { font-family: 'Open Sans','Helvetica Neue','Helvetica','Arial' } ",
"</style>",
"</head>",
"<body>",
"<table border='0'><tr><td>"
"<img src='http://neo4j.com/wp-content/themes/neo4jweb/assets/images/neo4j-blue-logo.png' />"
"</td></tr></table>",
"<h1>Employee Relationships</h1>",
"<ul>"
]
# py2neo.set_auth_token('%s:%s' % (NEO_HOST, NEO_PORT), NEO_AUTH_TOKEN)
graph = Graph('%s://%s:%s@%s:%s/db/data/' %
(NEO_PROTOCOL, NEO_USER, NEO_PASSWORD, NEO_HOST, NEO_PORT))
cyres = graph.cypher.execute("""MATCH
path = (e:Employee)<-[:REPORTS_TO*]-(sub)
WITH
e, sub,
[person in NODES(path) | person.firstName][1..-1] AS path
RETURN
e.employeeID AS managerID,
e.firstName AS managerName,
sub.employeeID AS employeeID,
sub.firstName AS employeeName,
CASE
WHEN LENGTH(path) = 0
THEN null
ELSE path END AS via
ORDER BY LENGTH(path)""");
for record in cyres:
response_parts.append("<li>")
response_parts.append("<b>%s</b> reports to <b>%s</b>" %
(record['employeeName'], record['managerName']))
if isinstance(record['via'], list):
response_parts.append("via")
response_parts.append("<b>")
response_parts.append(','.join(record['via']))
response_parts.append("</b>")
response_parts.append("</li>")
response_parts.append("</body></html>")
message = '\r\n'.join(response_parts)
self.send_response(200)
self.end_headers()
self.wfile.write(message)
return
if __name__ == '__main__':
from BaseHTTPServer import HTTPServer
server = HTTPServer(('', HTTP_PORT), GetHandler)
print 'Starting server, use <Ctrl-C> to stop'
server.serve_forever()
We could stop here, we’ve essentially created a document database. We have a vast sea of island-nodes. Some islands are products, some islands are… but if we wanted to know how anything was related to another, we’d have to manually create some sort of join logic.
creativity —> well what if we wanted to understand our customers based upon different criteria, insofar as we can change our data model, iteratively, to reflect changing business inputs.