More Related Content Similar to Deep Dive on Amazon Neptune - AWS Online Tech Talks (20) More from Amazon Web Services (20) Deep Dive on Amazon Neptune - AWS Online Tech Talks1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Brad Bebee, AWS, Principal Product Manager, Amazon Neptune
January 22, 2018
Deep Dive on Amazon Neptune
Fast, reliable graph database built for the cloud
B r a d B e b e e , A W S , P r i n c i p a l P M , A m a z o n N e p t u n e
2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Building Applications on Highly Connected Data
• Types of Graphs and How to Query Them
• Property Graph and Apache TinkerPop Friend Recommendation
Example
• RDF Knowledge Graph Example
• Behind Amazon Neptune’s Fully managed, Enterprise-ready
Features
3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HIGHLY CONNECTED DATA
Retail Fraud DetectionRestaurant RecommendationsSocial Networks
4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
U SE C ASES FOR H IGH LY C ON N EC TED D ATA
Social Networking
Life
Sciences
Network & IT OperationsFraud Detection
Recommendations Knowledge Graphs
5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RECOMMENDATIONS BASED ON RELATIONSHIPS
6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KNOWLEDGE GRAPH APPLICATIONS
What museums should Alice
visit while in Paris?
Who painted the Mona Lisa?
What artists have paintings
in The Louvre?
7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
N AVIGATE A W EB OF GLOBAL TAX POLIC IES
“Our customers are increasingly required to navigate a complex web of global tax policies and
regulations. We need an approach to model the sophisticated corporate structures of our
largest clients and deliver an end-to-end tax solution. We use a microservices architecture
approach for our platforms and are beginning to leverage Amazon Neptune as a graph-based
system to quickly create links within the data.”
said Tim Vanderham, chief technology officer, Thomson Reuters Tax & Accounting
8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges Building Apps with Highly Connected DataRELATIONAL DATABASE CHALLENGES
BUILDING APPS WITH HIGHLY CONNECTED
DATA
Unnatural for
querying graph
Inefficient
graph processing
Rigid schema inflexible
for changing data
9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DIFFERENT APPROACHES FOR HIGHLY
CONNECTED DATA
Purpose-built for a business process
Purpose-built to answer questions about
relationships
10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A GR APH D ATABASE IS OPTIMIZED FOR EFFIC IEN T
STOR AGE AN D R ETR IEVAL OF H IGH LY C ON N EC TED
D ATA.
11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Open Source Apache TinkerPop™
Gremlin Traversal Language
W3C Standard
SPARQL Query Language
R E S O U R C E
D E S C R I P T I O N
F R A M E W O R K ( R D F )
P R O P E RT Y
G R A P H
LEADING GRAPH MODELS AND
FRAMEWORKS
12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CHALLENGES OF EXISTING GRAPH
DATABASES
Difficult to maintain
high availability
Difficult to scale
Limited support for
open standards
Too expensive
13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE
F u l l y m a n a g e d g r a p h d a t a b a s e
FAST RELIABLE OPEN
Query billions of
relationships with
millisecond latency
6 replicas of your data
across 3 AZs with full
backup and restore
Build powerful queries
easily with Gremlin and
SPARQL
Supports Apache
TinkerPop & W3C RDF
graph models
EASY
14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE HIGH LEVEL
ARCHITECTURE
Bulk load
from S3
Database
Mgmt.
15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Types of Graphs and How to
Query Them
16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PROPERTY GRAPH
A property graph is a set of vertices and edges with
respective properties (i.e. key/value pairs)
• Vertex represents entities/domains
• Edge represents directional relationship
between vertices.
• Each edge has a label that denotes the
type of relationship
• Each vertex & edge has a unique identifier
• Vertex and edges can have properties
• Properties express non-relational information about the
vertices and edges
FRIENDname:
Bill
name:
Sarah
UserUser
Since 11/29/16
17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CREATING A TINKERPOP GRAPH
//Connect to Neptune and receive a remote graph, g.
user1 = g.addVertex (id, 1, label, "User", "name", "Bill");
user2 = g.addVertex (id, 2, label, "User", "name", "Sarah");
...
user1.addEdge("FRIEND", user2, id, 21);
Gremlin (Apache TinkerPop 3.3)
FRIEND
name:
Bill
name:
Sarah
User
User
18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RDF GRAPHS
• RDF Graphs are described as a collection of triples: subject, predicate, and object.
• Internationalized Resource Identifiers (IRIs) uniquely identify subjects.
• The Object can be an IRI or Literal.
• A Literal in RDF is like a property and RDF supports the XML data types.
• When the Object is an IRI, it forms an “Edge” in the graph.
<http://www.socialnetwork.com/person#1>
rdf:type contacts:User;
contact:name: ”Bill” .
subject
predicate
Object (literal)
name:
Bill
User
<http://www.socialnetwork.com/person#1>IRI
<http://www.socialnetwork.com/person#1>
contacts:friend
<http://www.socialnetwork.com/person#2> .
subject
predicate
Object (IRI)
FRIEND
#1 2#2
19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“THERE’S NO TROUBLE WITH TRIPLES”:
RDF EXAMPLE
@prefix contacts: <http://www.socialnetwork.com/people#>.
<http://www.socialnetwork.com/person#1>
rdf:type contacts:User;
contact:name: ”Bill” .
<http://www.socialnetwork.com/person#1>
contacts:friend <http://www.socialnetwork.com/person#2> .
<http://www.socialnetwork.com/person#2>
rdf:type contacts:User;
contact:name: ”Sarah” .
RDF
(Turtle Serialization)
FRIEND
name:
Bill
name:
Sarah
User
User
20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPH VS. RELATIONAL DATABASE
MODELING.
* Source : http://www.playnexacro.com/index.html#show:article
Relational model Graph model subset
CompanyName:
Acme
…
Customers
OrderDate:
8/1/2017
…
Order
PURCHASED
HAS_DETAILS
UnitPrice:
$179.99
…
Order
DetailsProductName:
“Echo”
…
Product
HAS_PRODUCT
CompanyName:
“Amazon”
…
SupplierSUPPLIES
21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SQL RELATIONAL DATABASE QUERY
SELECT distinct c.CompanyName
FROM customers AS c
JOIN orders AS o ON /* Join the customer from the order */
(c.CustomerID = o.CustomerID)
JOIN order_details AS od /* Join the order details from the order
*/
ON (o.OrderID = od.OrderID)
JOIN products as p /* Join the products from the order details
*/
ON (od.ProductID = p.ProductID)
WHERE p.ProductName = ’Echo'; /* Find the product named ‘Echo’ */
Find the name of companies that purchased the ‘Echo’.
22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SPARQL DECLARATIVE GRAPH QUERY
PREFIX sales_db: <http://sales.widget.com/>
SELECT distinct ?comp_name WHERE {
?customer sales_db:HAS_ORDER ?order ; #customer graph pattern
sales_db:CompanyName ?comp_name .
?order sales_db:HAS_DETAILS ?order_d . #orders graph pattern
?order_d sales_db:HAS_PRODUCT ?product . #order details graph
pattern
?product sales_db:ProductName “Echo” . #products graph pattern
}
* Source : http://www.playnexacro.com/index.html#show:article
Find the name of companies that purchased the ‘Echo’.
23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GREMLIN IMPERATIVE GRAPH TRAVERSAL
/* All products named ”Echo” */
g.V().hasLabel(‘Product’).has('name',’Echo')
.in(’HAS_PRODUCT') /* Traverse to order details */
.in(‘HAS_DETAILS’) /* Traverse to order */
.in(’HAS_ORDER’) /* Traverse to Customer */
.values(’CompanyName’).dedup() /* Unique Company Name */
Find the name of companies that purchased the ‘Echo’.
24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Property Graph and Apache
TinkerPop Social Network Friend
Recommendation Example
25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
TRIADIC CLOSURE – CLOSING TRIANGLES
FRIEND
FRIEND
Terry
Bill
Sarah
FRIEND
26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RECOMMENDING NEW CONNECTIONS
Terry
27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
IMMEDIATE FRIENDSHIPS
FRIEND
Terry
Bill
28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MEANS AND MOTIVE
FRIEND
FRIEND
Terry
Bill
Sarah
29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RECOMMENDATION
FRIEND
FRIEND
Terry
Bill
Sarah
30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recommend New Connections
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FIND TERRY
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FIND TERRY’S FRIENDS
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AND THE FRIENDS OF THOSE FRIENDS
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
user
friend
fof
FRIEND
FRIEND
34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
...WHO AREN’T TERRY AND AREN’T
FRIENDS WITH TERRYg = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
user
friend
fof
X
FRIEND
FRIEND
35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RDF Knowledge Graph
Example
36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
URIs AS GLOBALLY UNIQUE
IDENTIFIERS
URIs to identify nodes and edge labels
<https://permid.org/1-4295902158>
=> identifies the company “Netflix Inc”
organization:isIncorporatedIn1
=> identifies the relationship “is incorporated in”
<http://sws.geonames.org/6252001/>
=> identifies country “USA”
1 This is a shortcut for
<http://permid.org/ontology/organization/isIncorporatedIn>.
RDF uses XML prefix notation, where the prefix organization is a
shortcut for <http://permid.org/ontology/organization/>.
37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RDF GRAPH: COLLECTION OF TRIPLES
(1)
# Every edge in the RDF graph is represented as
# a (subject, predicate, object) triple
<https://permid.org/1-4295902158>
organization:isIncorporatedIn
<http://sws.geonames.org/6252001/> .
subject
predicate
object
38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RDF GRAPH: COLLECTION OF TRIPLES
(2)
# Every edge in the RDF graph is represented as
# a (subject, predicate, object) triple
<https://permid.org/1-4295902158>
organization:isIncorporatedIn
<http://sws.geonames.org/6252001/> .
<https://permid.org/1-4295902158>
vcard:organization-name
"Netflix Inc" .
subject
predicate
object
(URI)
subject
predicate
object
(literal)
Literals are “sinks” in the graph. They do not
have any outgoing edges.
RDF supports strings and other XML datatypes
(bool, integer, dates, floats, doubles, …)
39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RDF GRAPH: COLLECTION OF TRIPLES
(3)
# Every edge in the RDF graph is represented as
# a (subject, predicate, object) triple
<https://permid.org/1-4295902158>
organization:isIncorporatedIn
<http://sws.geonames.org/6252001/> .
<https://permid.org/1-4295902158>
vcard:organization-name
"Netflix Inc" .
<https://permid.org/1-4295902158>
organization:hasRegisteredPhoneNumber
"13026587581" .
<http://sws.geonames.org/6252001/>
iso:countryCode
”US” .
Same URI used as both subject and object, depending on
whether we represent outgoing vs. incoming RDF edges.
40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
THE BENEFIT OF URIs: LINKED DATA
Linking across datasets by referencing globally unique URIs
GeoNames
Wikidata
PermID
Example: PermID (re)uses <http://sws.geonames.org/6252001/>
as a global Identifier for the USA, which is an identifier rooted in GeoNames.
41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
QUERYING RDF USING SPARQL (1)
?name
Variables are
prefixed with “?”.
42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
QUERYING RDF USING SPARQL (2)
?property
?property
?property
?node ?node
?node
43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
QUERYING RDF USING SPARQL (3)
?org
?phone
44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Behind Amazon Neptune’s Fully
Managed, Enterprise Ready
Features
45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fully Managed Service
Easily configurable via the Console
Multi-AZ High Availability, ACID
Support for up to 15 read replicas
Supports Encryption at rest
Supports Encryption in transit (TLS)
Backup and Restore, Point-in-time
Recovery
B E N E F I T S
46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Secure deployment in a VPC
• Increased availability through
deployment in two subnets in two
different Availability Zones (AZs)
• Cluster volume always spans three
AZ to provide durable storage
• See the Amazon Neptune
Documentation for VPC setup details
AMAZON NEPTUNE: VPC DEPLOYMENT
47. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE READ REPLICAS
Availability
• Failing database nodes are
automatically detected and replaced
• Failing database processes are
automatically detected and recycled
• Replicas are automatically promoted to
primary if needed (failover)
• Customer specifiable fail-over order
AZ 1 AZ 3AZ 2
Primary
Node
Primary
Node
Primary
Master
Node
Primary
Node
Primary
Node
Read
Replica
Primary
Node
Primary
Node
Read
Replica
Cluster
and
Instance
Monitoring
Performance
• Customer applications can scale out read
traffic across read replicas
• Read balancing across read replicas
48. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE FAILOVER TIMES ARE
TYPICALLY < 30 SECONDS
Replica-Aware App Running
Failure Detection DNS Propagation
Recovery
Database
Failure
1 5 - 2 0 s e c 3 - 1 0 s e c
App
Running
49. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Questions
50. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE IS AVAILABLE FOR
PREVIEW
• Preview Sign-up
• https://pages.awscloud.com/NeptunePreview.html
• Amazon Neptune Web Page
• https://aws.amazon.com/neptune/
• Amazon Neptune Documentation
• https://docs.aws.amazon.com/neptune/latest/userguide
51. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!