SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
Extensible RESTful
Applications
with Apache Tinkerpop
Graph Day SF 2018
About Us
LIKES
{
"first_name": "Varun",
"last_name": "Ganesh",
}
{
"first_name": "Harshvardhan",
"last_name": "Joshi",
}
LIKES
CONNECTING TO
BUSINESS STACKS VISUALISATION
CUSTOM BUILT
INFOGRAPHICS
NATURAL
LANGUAGE
GENERATED
INSIGHTS
EXPORT & SHARE
STORIES
EMAIL
POWERPOINT, TV
WEB
Embedded SDK
About
CLIENTS
• Automating the process of data storytelling
• For more information, visit www.nugit.co
Agenda
• Use Cases
• The Slack APIs
• Defining the Entities
• Graph Design and Considerations
• Making the Graph RESTful
• Building a DSL
• Testing the Application
• Scaling the Graph
Use Cases - Communities
• View contribution to
communication
• Participation across
channels
• Identify collaborative
groups
• Users connected by
mentions and reactions
• Identify influential users
per channel
• Highlight engaging conversations
• Top videos, GIFs, links
• Get insights across channels
Use Cases – Top Posts
Defining the
Entities
Top Post:
• Files shared
• Messages with attachments
• Posts without replies or reactions
are not considered
Defining the
Entities
Notable Message:
• Messages with reactions or replies
• Replies and Comments that have
reactions
• Other alerts that gather reactions
Defining the
Entities
Mention:
• Replies and Comments can have
mentions too
• Ignore mentions that are
unnecessary or alreadycaptured in
a relationship
Defining the Entities
• Narrows down data required for the use case
• Helps “whiteboarding” process for graph design
• Allows defining schema for payloads
• Requires understanding the nuances of the platform
Graph Design and Considerations
• Team node acts as root node
• Allows maintaing separate graphs
for different organisations
Graph Design and Considerations
• Top posts, notable messages are
both message nodes
• Differentiated using edge labels
• Edge traversals favoured over
property lookup
Graph Design and Considerations
• Any user can comment on, react to
or be mentioned in any message
• Reaction type modelled as edge
property
• Efficient as use-case does not need
filtering by reaction type
Graph Design and Considerations
• Same file shared across channels
shares common pool of reactions
• Schema respects Slack specific
behaviour
• Handles idempotency based on
unique ID maintained by Slack
Graph Design and Considerations
{
"type": "message",
"user": "U2FQG2G9F",
"text": "next time you want cereal: n<https://www.instagram.com/p/BcDN4eWFjac/?taken-
by=therock>",
"attachments": [
{
"service_name": "Instagram",
"title": "Instagram post by @therock • Nov 28, 2017 at 7:14pm UTC",
"title_link": "https://www.instagram.com/p/BcDN4eWFjac/?taken-by=therock",
"text": "346.3k Likes, 2,167 Comments - @therock on Instagram:”……”",
"fallback": "Instagram: Instagram post by @therock • Nov 28, 2017 at 7:14pm UTC",
"image_url": "https://scontent-iad3-1.cdninstagram.com/t51.2885-15/e35/24178_n.jpg",
"from_url": "https://www.instagram.com/p/BcDN4eWFjac/?taken-by=therock",
"image_width": 334,
"image_height": 250,
"image_bytes": 178559,
"service_icon": "https://www.instagram.com/static/images/ico/appl.png/932e4d9af891.png",
"id": 1
}
],
"thread_ts": "1511936426.000178",
"reply_count": 3,
"replies": [
{
"user": "U193XDML7",
"ts": "1511953167.000138”
},
{
"user": "U2FQG2G9F",
"ts": "1511953180.000044"
},
{
"user": "U193XDML7",
"ts": "1511953192.000230”
}
],
"ts": "1511936426.000178",
"reactions": [
{
"name": "smile",
"users": [
"U193XDML7”
],
"count": 1
},
{
"name": "obesecat",
"users": [
"U193XDML7”
],
"count": 1
}
]
}
The Slack APIs
Endpoint:
https://slack.com/api/conversations.history
Endpoint:
https://slack.com/api/conversations.history
[
{
"type": "message",
"user": "U4BPQR94L",
"text": "Yinghui Malmsteen
<@U2FQG2G9F>n<https://www.youtube.com/watch?v=D4OxW_0qqv8>",
"attachments": [
{
...
}
],
"ts": "1536057373.000100",
"reactions": [
{
"name": "flag-se",
"users": [
"U58LYK8Q6"
],
"count": 1
}
]
}
]
[
{
"user": "U2Q2U37SA",
"inviter": "U0LPSJQR0",
"text": "<@U2Q2U57SA> has joined the channel",
"type": "message",
"subtype": "channel_join",
"ts": "1536138265.000200”
}
]
The Slack APIs
[
{ "id": "U4C0FDU2J",
"team_id": "T028ZLMQN",
"name": "friendlybotdev",
"deleted": true,
"profile": {
"title": "",
"phone": "",
"skype": "",
"real_name": "Friendly Bot",
"real_name_normalized": "Friendly Bot",
"display_name": "friendlybotdev",
"display_name_normalized": "friendlybotdev",
"status_text": "",
"status_emoji": "",
"status_expiration": 0,
"avatar_hash": "123456",
"bot_id": "B4B47T0G3",
"api_app_id": "A4B92ZEER",
"always_active": true,
"image_original": "https://slack-edge.com/2017-06-21/123456_original.png",
"first_name": "Friendly",
"last_name": "Bot",
"image_24": "https://slack-edge.com/2017-06-21/123456_24.png",
"image_32": "https://slack-edge.com/2017-06-21/123456_32.png",
"image_48": "https://slack-edge.com/2017-06-21/123456_48.png",
"image_72": "https://slack-edge.com/2017-06-21/123456_72.png",
"image_192": "https://slack-edge.com/2017-06-21/123456_192.png",
"image_512": "https://slack-edge.com/2017-06-21/123456_512.png",
"image_1024": "https://slack-edge.com/2017-06-21/123456_1024.png",
"status_text_canonical": "",
"team": "T028Z5MQN"
},
"is_bot": true,
"is_app_user": false,
"updated": 1517305013
}
]
[
{ "id": "C8KMHCN5D",
"name": "arandomchannel",
"is_channel": true,
"created": 1507613685,
"creator": "U5BG5XU6T",
"is_shared": false,
"is_member": true,
"is_private": false,
"last_read": "1533892238.000324",
"latest": {
"type": "message",
"user": "U84K3ZTF9",
"text": "let's meetup tomorrow",
"ts": "1536139470.000100"
},
"unread_count": 7,
"unread_count_display": 7,
"members": [
"U08ED90CD",
"U0LPSJQR0",
"U193XDML7",
"U9LKWV9C1",
"UBJ4CHV5L" ],
"topic": {
"value": "place for people who are interested in sharing and learning",
"creator": "U5BGLXU6T",
"last_set": 1507613720
},
"purpose": {
"value": "",
"creator": "",
"last_set": 0
},
"previous_names": []
}
]
Endpoint:
https://slack.com/api/users.list
Endpoint:
https://slack.com/api/channels.info
The Slack APIs
The Journey So Far
• Defining entities and modelling them into Graph
• Iterative feedback-drivenprocess
• Understanding the data available from the API
• Identifying unique IDs
• Filtering out required fields
Data Ingestion and Extraction
• Apache Flink cluster retrieves, parses and filters Slack data
• GraphQL service requests data for visualization
• Flask REST service ingests/queries data to/from Tinkerpop
POST
PUT
GET
Gremlin-Python
Gremlin
Bytecode
Why Tinkerpop?
• Abstraction that lets us avoid vendor lock-in
• Reduces rework when switching data stores
• Gremlin query language
• Hadoop and SparkComputer
Making the Graph RESTful
• Defining REST Endpoints
• Defining the Resources
• Remote Traversals
• Write endpoints for seeding
• POST /teams/<team_uid>/channels
• POST /teams/<team_uid>/channels/<channel_uid>/messages
• Handling Idempotency
• Replace default strategy with ”ElementIDStrategy”
• Enables creation of nodes with Slack specific unique IDs
Defining REST Endpoints
// scripts/empty-sample.groovy
globals << [g : graph.traversal(),sg: graph.traversal().withStrategies(ElementIdStrategy.build().create())]
• Read endpoints for queries
• GET /teams/<team_uid>/top_posts
Making the Graph RESTful
• Setting up REST Endpoints
• Defining the Resources
• Remote Traversals
Defining the Resources
from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema
from marshmallow.exceptions import ValidationError
...
class MessageSchema(Schema):
""" Holds all the required fields for a message object."""
ts = fields.Float(required=True)
text = fields.Str()
comment = fields.Str()
subtype = fields.Str()
bot_id = fields.Str(validate=is_bot_uid)
user = fields.Str(validate=is_user_uid) thread_ts = fields.Str()
file_share = fields.Nested(FileShareSchema, load_from="file")
attachments = fields.Nested(AttachmentSchema, many=True)
reactions = fields.Nested(ReactionSchema, many=True)
comments = fields.Nested(CommentSchema, many=True, load_from="replies")
mentions = fields.List(fields.Str(validate=is_user_uid))
class AttachmentSchema(Schema):
""" Holds all the required fields for an Attachment object."""
class ReactionSchema(Schema):
""" Holds all the required fields for a reaction object."""
class CommentSchema(Schema):
""" Holds all the required fields for a comment object."""
...
• Organized code with single point of
reference
Defining the Resources
from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema
from marshmallow.exceptions import ValidationError
...
class MessageSchema(Schema):
""" Holds all the required fields for a message object."""
@validates_schema
def validate_message(self, data):
""" Validate if the message contains any of comments, mentions or reactions. """
if not any([f(data) for f in (has_comments, has_mentions, has_reactions)]):
raise ValidationError("The message must contain comments, mentions or
reactions")
ts = fields.Float(required=True)
text = fields.Str()
comment = fields.Str()
subtype = fields.Str()
bot_id = fields.Str(validate=is_bot_uid)
user = fields.Str(validate=is_user_uid) thread_ts = fields.Str()
file_share = fields.Nested(FileShareSchema, load_from="file")
attachments = fields.Nested(AttachmentSchema, many=True)
reactions = fields.Nested(ReactionSchema, many=True)
comments = fields.Nested(CommentSchema, many=True, load_from="replies")
mentions = fields.List(fields.Str(validate=is_user_uid))
class AttachmentSchema(Schema):
""" Holds all the required fields for an Attachment object."""
class ReactionSchema(Schema):
""" Holds all the required fields for a reaction object."""
class CommentSchema(Schema):
""" Holds all the required fields for a comment object."""
...
• Organized code with single point of
reference
• Validate data before ingestion
• Enforce types and required fields
@validates_schema
def validate_message(self, data):
""" Validate if the message contains any of comments, mentions or reactions. """
if not any([f(data) for f in (has_comments, has_mentions, has_reactions)]):
raise ValidationError("The message must contain comments, mentions or
reactions")
from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema
from marshmallow.exceptions import ValidationError
...
class MessageSchema(Schema):
""" Holds all the required fields for a message object."""
class AttachmentSchema(Schema):
""" Holds all the required fields for an Attachment object."""
title = fields.Str()
fallback = fields.Str()
text = fields.Str()
thumb_url = fields.Str()
image_url = fields.Str()
title_link = fields.Str()
@post_load
def reshape_attachment(self, data):
""" Apply required transformations on the Attachment object. ""”
# Create a post_title field
collapse_keys(data, "post_title", *("fallback", "title", "text"))
# Create a post_thumbnail field
collapse_keys(data, "post_thumbnail", *("thumb_url", "image_url",
"title_link"))
# Set post_type to URL
data["post_type"] = "URL”
class ReactionSchema(Schema):
""" Holds all the required fields for a reaction object."""
class CommentSchema(Schema):
""" Holds all the required fields for a comment object."""
class FileShareSchema(Schema):
""" Holds all the required fields for a File Share object.""”
class UserSchema(Schema):
""" Holds all the required fields for a User object.""”
...
• Organized code with single point of
reference
• Validate data before ingestion
• Enforce types and required fields
• Normalize fields with post-
processing
Defining the Resources
@post_load
def reshape_attachment(self, data):
""" Apply required transformations on the Attachment object. ""”
# Create a post_title field
collapse_keys(data, "post_title", *("fallback", "title", "text"))
# Create a post_thumbnail field
collapse_keys(data, "post_thumbnail", *("thumb_url", "image_url",
"title_link"))
# Set post_type to URL
data["post_type"] = "URL”
Making the Graph RESTful
• Schema enforcement and validation
• Handling Idempotency of endpoints
• Custom Traversal Source
Remote Traversals
• Bytecode sent over network instead of string
• Allows using custom traversal source for a Domain Specific Language (DSL)
from gremlin_python.driver.driver_remote_connection import
DriverRemoteConnection
...
conn = DriverRemoteConnection(GREMLIN_SERVER_HOST, 'sg')
slack = Graph().traversal(SlackTraversalSource).withRemote(conn)
Building a DSL
• Motivations
• Custom Workflows
Building a DSL - Motivations
class SlackTraversalSource(BaseTraversalSource):
""" Module to initialise a Graph with the methods listed under SlackTraversal. """
def __init__(self, *args, **kwargs):
super(SlackTraversalSource, self).__init__(*args, **kwargs)
self.graph_traversal = SlackTraversal
def channels(self, *channel_ids):
""" Shorthand to identify all channel nodes"""
return traversal
• Custom traversal source can also specify useful shorthands
• E.g. Traversing to all the Channel nodes
traversal = self.get_graph_traversal()
traversal.bytecode.add_step("V")
traversal.bytecode.add_step("hasLabel", NODES.channel)
if channel_ids:
traversal.bytecode.add_step("has", "__id", P.within(channel_ids))
Building a DSL - Motivations
class SlackTraversal(BaseTraversal):
def addPartOfChannelEdges(self, channel_uid, *user_uids, **kwargs):
""" Add an edge to a channel from the users who were/are a part of the channel. ""”
return self
• Custom traversal source specifies business logic behind traversals
• E.g. Connecting a User node to a Channel node
for user_uid in user_uids:
edge_uid = construct_uid(user_uid, channel_uid, EDGES.part_of.name, delim="|")
self.getOrAddEdgeFrom(edge_label=EDGES.part_of, edge_uid=edge_uid,
node_label=NODES.user, node_uid=user_uid)
.upsertProperties(kwargs.get("properties")).inV()
Building a DSL - Motivations
from gremlin_python.process.graph_traversal import GraphTraversal
from gremlin_python.process.graph_traversal import GraphTraversalSource, __
class BaseTraversal(GraphTraversal):
def getOrAddEdgeFrom(self, edge_label, edge_uid, node_label, node_uid):
"""
Adds an edge from the node with the given label and uid only if the edge doesn’t exist.
"""
return self.coalesce(
__.addE(edge_label).property(T.id, edge_uid).from_(
__.V().getNode(node_label, node_uid)))
__.InE(edge_label).hasId(edge_uid).and(
__.outV().hasId(node_uid), __.outV().hasLabel(node_label)),
• BaseTraversal handles creation of nodes and edges
• These methods should guarantee idempotency
• E.g. Creation of edges between two nodes…
• ...checks for an existing edge
Building a DSL - Motivations
from gremlin_python.process.graph_traversal import GraphTraversal
from gremlin_python.process.graph_traversal import GraphTraversalSource, __
class BaseTraversal(GraphTraversal):
def getOrAddEdgeFrom(self, edge_label, edge_uid, node_label, node_uid):
"""
Adds an edge from the node with the given label and uid only if the edge doesn’t exist.
"""
return self.coalesce(
__.InE(edge_label).hasId(edge_uid).and(
__.outV().hasId(node_uid), __.outV().hasLabel(node_label)),
__.addE(edge_label).property(T.id, edge_uid).from_(
__.V().getNode(node_label, node_uid)))
• The edge is created only if it doesn’t already exist
def build_visualization(self, traversal_source,
**kwargs):
""" The below are standardized steps that are
required to generate data for any visualization."""
return self.start(traversal_source)
.filterByDate(self.date_dimension,
kwargs.get("start_time"),
kwargs.get("end_time"))
.filterByFields(self.filters_map,
kwargs.get("filters"))
.sortByFields(self.sorting_map,
kwargs.get("sort_field"),
kwargs.get("sort_direction"))
.buildObject(self.object_map).toList()
Building a DSL – Custom Workflows
• Standardized steps for generating a visualization are defined in the BaseTraversal
• Custom maps define traversal paths for fields that vary across visualizations
Building a DSL – Custom Workflows
# Sample filter from frontend
filter_obj = {'_and': [{"field": 'reactions', '_gte': 100},
{"field": 'post_creator',
'_in': [‘bob’, ‘chloe']
}]}
filter_map = {"post_creator": lambda pred:
__.in_(EDGES.created_post).has(USER.display_name, pred),
"reactions": lambda pred:
__.inE(EDGES.reacted_to).count().is_(pred)
}
object_map = {
"post_creator": {"uid": [__.in_(EDGES.created_post).values("__id"),
__.constant("")],
"image": ... # define similar path here,
},
"reactions":
__.inE(EDGES.reacted_to).groupCount().by(__.values(REACTION.name))
}
start = lambda traversal_source: traversal_source.posts()
# DSL generates the required lower level base traversals
slack.posts().where(
__.and_(
__.inE(EDGES.reacted_to).count().is_(P.gte(100)),
__.in_(EDGES.created_post).has(USER.display_name,
P.within(['bob', 'chloe'])))).
project("post_creator", "reactions").by(
__.project("image", "display_name", "uid").by(
__.in(EDGES.created_post).values(USER.image),
__.in(EDGES.created_post).values(USER.display_name),
__.in(EDGES.created_post).values("__id"))).by(
__.inE(EDGES.reacted_to).groupCount()).toList()
# Inject maps into DSL methods
start(slack)
.filterByFields(self.filters_map, kwargs.get("filters"))
.buildObject(self.object_map)
.toList()
• The DSL takes in functions/paths that map fields to their traversals
• Maps customized based on the visualization that is needed
Building a DSL – Custom Workflows
{
"reactions": {
"palm_tree": 82,
"robot_face": 18
},
"post_creator": {
"image": "https://url_of_image.jpg",
"display_name": ”chloe",
"uid": "U024ZH7HL”
}
}
• The traversals generated churn out the final response objects
• Objects rendered into visualizations by the client
Testing the Application
• Unit Tests
• Validating traversals on Gremlin Server
Check if test passes
Use Fixtures
Write code to make the
test pass
Write a failing test
class TestNodeMethods(object):
""" Test methods that help in retrieval and creation of Nodes. """
def test_node_retrieval(self, graph):
""" Test if getNode retrieves an existing node. """
assert graph.V().getNode(label="person", uid=100)
.count().next() == 1
assert graph.V().getNode(label="person", uid=101)
.count().next() == 1
Start Gremlin
Server
Testing Our Application – Unit Testing
Check if test passes
Use Fixtures
Write code to make the
test pass
Write a failing test
Start Gremlin
Server
def getNode(self, label, uid):
"""
Returns the node with the given label and uid.
Args: label (string): The label of the node to return
uid (string): Unique ID of the node
Raises: StopIteration: Node with the given label and uid does not exist
"""
return self.and_(__.hasLabel(label), __.has(T.id, uid))
Testing Our Application – Unit Testing
Check if test passes
Use Fixtures
Write code to make the
test pass
Write a failing test
Start Gremlin
Server
$ bin/gremlin-server.sh conf/gremlin-server-neo4j-python.yaml
class TestBasicTraversal(object):
"""
Tests for methods that help create edges or nodes
and methods that help populate the properties of these objects.
"""
@pytest.fixture(scope="module")
def graph(self):
""" Graph with two nodes and one edge connecting them. """
graph = Graph().traversal(CerebroTraversalSource)
.withRemote(
DriverRemoteConnection(GREMLIN_SERVER_HOST,
GREMLIN_SERVER_TRAVERSER))
graph.V().clear()
from_node = graph.addV("person").
property(T.id, 100).next()
to_node = graph.addV("person").
property(T.id, 101).next()
graph.addE("knows").from_(from_node).to(to_node)
.property("__id", "1")
.next()
yield graph
graph.V().clear()
Testing Our Application – Unit Testing
Check if test passes
class TestNodeMethods(object):
""" Test methods that help in retrieval and creation of Nodes. """
def test_node_retrieval(self, graph):
""" Test if getNode retrieves an existing node. """
assert graph.V().getNode(label="person", uid=100)
.count().next() == 1
assert graph.V().getNode(label="person", uid=101)
.count().next() == 1
Write code to make the
test pass
Write a failing test
Use Fixtures
Start Gremlin
Server
Testing Our Application – Unit Testing
[
{
"reactions": [
{
"name": "joy",
"users": [
"U5K7JUATE”
]
}
],
"attachments": [
{
...
}
],
"text": "<https://www.youtube.com/watch?v=4iEh1ykb13w>",
"ts": "1465895473.000050",
"user": "U37BF9457",
"type": "message”
}
]
Testing Our Application – Unit Testing
class MessageSchema(Schema):
""" Holds all the required fields for a message object."""
. . .
• Fixture used to test if the
MessageSchema class is
implemented correctly
[
{
"reactions": [
{
"name": "joy",
"users": [
"U5K7JUATE”
]
}
],
"attachments": [
{...}
],
"text": ” <@U123456> <https://www.youtube.com/watch?v=4iEh1ykb13w>",
"mentions": [
"U123456”
],
"ts": ”a
"type": "message”
}
]
Testing Our Application – Unit Testing
class MessageSchema(Schema):
""" Holds all the required fields for a message object."""
mentions = fields.List(fields.Str(validate=is_user_uid))
• MessageSchema needs
to include mentions
• Update the fixture to
be able to test that the
schema includes
mentions
• Need to validate if
traversals pick up
mentions
Write code to make the
test pass
Write a failing test
[
{
"reactions": [
{
"name": "joy",
"users": [
"U5K7JUATE”
]
}
],
"attachments": [
{...}
],
"text": ” <@U123456> <https://www.youtube.com/watch?v=4iEh1ykb13w>",
"mentions": [
"U123456”
],
"ts": ”a
"type": "message”
}
]
gremlin> graph.io(graphson()).writeGraph("graph_name.json")
Testing Our Application – Unit Testing
Update JSON &
Generate GraphSON
Check if test passes
Use Fixtures
Start Gremlin
Server
Write code to make the
test pass
Write a failing test
@pytest.fixture(scope="module")
def slack_graph():
""" Open a subgraph on localhost for testing. """
slack.V().clear()
slack_client = Client(GREMLIN_SERVER_HOST, SLACK_TRAVERSER)
path_to_fixture = str(Path.cwd().joinpath(
"tests/fixtures/slack_graph.json"))
graphson_statement = 'graph.io(graphson()).readGraph("{}")’.
format(path_to_fixture)
slack_client.submit(graphson_statement).all().result()
yield slack
slack.V().clear()
Testing Our Application – Unit Testing
Update JSON &
Generate GraphSON
Check if test passes
Use Fixtures
Start Gremlin
Server
Testing the Application – CI/CD
• Automated tests using CircleCI
• Custom Configuration for Gremlin Server
• Caching Dependencies for Faster Tests
steps: #CircleCI 2.0
...
- run:
command: |
if [ ! -d ./apache-tinkerpop-gremlin-server-3.3.3 ]; then
curl -O https://archive.apache.org/dist/tinkerpop/3.3.3/apache-tinkerpop-gremlin-server-
3.3.3-bin.zip
unzip -q apache-tinkerpop-gremlin-server-3.3.3-bin.zip
# Install gremlin-python
cd ./apache-tinkerpop-gremlin-server-3.3.3 && 
./bin/gremlin-server.sh install org.apache.tinkerpop gremlin-python 3.3.3
# Change max content length and traversal strategy
sed -i -- 's/.*maxContentLength:.*/maxContentLength: 2621440/g' conf/gremlin-server.yaml
sed -i -- 's/graph.traversal()]/graph.traversal(),sg:
graph.traversal().withStrategies(ElementIdStrategy.build().create())]/g' 
./scripts/empty-sample.groovy
fi
...
Testing the Application – CI/CD
Testing the Application – CI/CD
steps: #CircleCI 2.0
- checkout
- restore_cache:
keys:
- v1-dependencies-{{ .Branch }}
- v1-dependencies-master
- run:
# Download and install Gremlin server
...
# Cache the installation
- save_cache:
key: v1-dependencies-{{ .Branch }}
paths:
- ~/src/app_name/apache-tinkerpop-gremlin-server-3.3.3
# Test
- run:
# Starting Gremlin Server
command: |
cd ./apache-tinkerpop-gremlin-server-3.3.3 && ./bin/gremlin-server.sh 
./conf/gremlin-server.yaml
background: true
# Sleep to give the gremlin server enough time to start
- run: sleep 10
- run: pycodestyle app_name
- run: coverage run --source=app_name -m pytest tests --capture=no --strict
- run: coverage report -m --fail-under=95
Testing the Application – CI/CD
Scaling Our Graph
• Async Traversals
• HA Cluster and Load Balancing
def seed_channels(data, team_uid):
for channel_data in data:
channel_uid, creator, members = (channel_data.pop(key) for
key in ["uid", "creator", "members"])
slack.V().addChannel(channel_uid, properties=channel_data).next()
slack.teams(team_uid).addTeamHasChannelEdge(team_uid, channel_uid).next()
slack.users(creator).addCreatedChannelEdge(creator, channel_uid).next()
slack.channels(channel_uid).addPartOfChannelEdges(channel_uid, *members).next()
def seed_channels(data, team_uid):
for channel_data in data:
channel_uid, creator, members = (channel_data.pop(key) for
key in ["uid", "creator", "members"])
slack.V().addChannel(channel_uid, properties=channel_data)
.addTeamHasChannelEdge(team_uid, channel_uid).inV()
.addCreatedChannelEdge(creator, channel_uid).inV()
.addPartOfChannelEdges(channel_uid, *members).next()
def seed_channels(data, team_uid):
for channel_data in data:
channel_uid, creator, members = (channel_data.pop(key) for
key in ["uid", "creator", "members"])
slack.V().addChannel(channel_uid, properties=channel_data)
.addTeamHasChannelEdge(team_uid, channel_uid).inV()
.addCreatedChannelEdge(creator, channel_uid).inV()
.addPartOfChannelEdges(channel_uid, *members).promise()
• Seed subgraph using “next”
• Reduce number of blocking calls to one
per channel
• Seed subgraph using “promise”
• Make seeding asynchronous, no
blocking calls
• Verify that the returned futures were
successful
• Seed individual entities using “next”
• Each call to “next” is blocking
Async Traversals
next()
next()
next()
next()
next()
promise()
HA Cluster and Load Balancing
• Preparing for high availability with Neo4J and Gremlin
• Configuring Gremlin Server and Neo4J
• Understanding the Neo4J HA Architecture
• Advantages
• Data replication
• Spread writes across instance
• Handle greater read loads
• HA cluster is fronted by a load balancer like HAProxy
• Reference:
• https://neo4j.com/docs/operations-manual/current/ha-cluster/architecture/
• http://tinkerpop.apache.org/docs/3.3.3/reference/#_high_availability_configuration
HA Cluster and Load Balancing
• Tuning parameters for the cluster
• Frequency of pulling updates from other members of the cluster
• gremlin.neo4j.conf.ha.pull_interval
• Number of slaves a transaction should be committed to
• gremlin.neo4j.conf.ha.tx_push_factor
• Tuning parameters for the Load Balancer
• Routing requests across the cluster
• balance
• Checking if the members in the cluster are responsive
• option httpchk
// gremlin-server-neo4j-ha-{1..3}.yaml
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
> curl "http://localhost:8182?gremlin=100-1"
Thank You
Graph Day SF 2018

Contenu connexe

Tendances

Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Philips Kokoh Prasetyo
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with ElasticsearchAleksander Stensby
 
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013Niels Henrik Hagen
 

Tendances (6)

Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
 
tutorial2-notes2
tutorial2-notes2tutorial2-notes2
tutorial2-notes2
 

Similaire à Extensible RESTful Applications with Apache Tinkerpop Graph

2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar SlidesDuraSpace
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status FeedMongoDB
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearchdnoble00
 
Webinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedWebinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedMongoDB
 
Office Dev Day 2018 - Extending Microsoft Teams
Office Dev Day 2018 - Extending Microsoft TeamsOffice Dev Day 2018 - Extending Microsoft Teams
Office Dev Day 2018 - Extending Microsoft TeamsAndré Vala
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
An Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamAn Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamMikkel Flindt Heisterberg
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data AnalyticsAmazon Web Services
 
Mikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamMikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamLetsConnect
 
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013Amazon Web Services
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...Prasoon Kumar
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overviewAmit Juneja
 
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"South Tyrol Free Software Conference
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesignMongoDB APAC
 
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディングXitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディングscalaconfjp
 

Similaire à Extensible RESTful Applications with Apache Tinkerpop Graph (20)

REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Webinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedWebinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting Started
 
Office Dev Day 2018 - Extending Microsoft Teams
Office Dev Day 2018 - Extending Microsoft TeamsOffice Dev Day 2018 - Extending Microsoft Teams
Office Dev Day 2018 - Extending Microsoft Teams
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
An Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamAn Introduction to Working With the Activity Stream
An Introduction to Working With the Activity Stream
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Mikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamMikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity Stream
 
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディングXitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
 

Dernier

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 

Dernier (17)

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 

Extensible RESTful Applications with Apache Tinkerpop Graph

  • 1. Extensible RESTful Applications with Apache Tinkerpop Graph Day SF 2018
  • 2. About Us LIKES { "first_name": "Varun", "last_name": "Ganesh", } { "first_name": "Harshvardhan", "last_name": "Joshi", } LIKES
  • 3. CONNECTING TO BUSINESS STACKS VISUALISATION CUSTOM BUILT INFOGRAPHICS NATURAL LANGUAGE GENERATED INSIGHTS EXPORT & SHARE STORIES EMAIL POWERPOINT, TV WEB Embedded SDK About CLIENTS • Automating the process of data storytelling • For more information, visit www.nugit.co
  • 4. Agenda • Use Cases • The Slack APIs • Defining the Entities • Graph Design and Considerations • Making the Graph RESTful • Building a DSL • Testing the Application • Scaling the Graph
  • 5. Use Cases - Communities • View contribution to communication • Participation across channels • Identify collaborative groups • Users connected by mentions and reactions • Identify influential users per channel
  • 6. • Highlight engaging conversations • Top videos, GIFs, links • Get insights across channels Use Cases – Top Posts
  • 7. Defining the Entities Top Post: • Files shared • Messages with attachments • Posts without replies or reactions are not considered
  • 8. Defining the Entities Notable Message: • Messages with reactions or replies • Replies and Comments that have reactions • Other alerts that gather reactions
  • 9. Defining the Entities Mention: • Replies and Comments can have mentions too • Ignore mentions that are unnecessary or alreadycaptured in a relationship
  • 10. Defining the Entities • Narrows down data required for the use case • Helps “whiteboarding” process for graph design • Allows defining schema for payloads • Requires understanding the nuances of the platform
  • 11. Graph Design and Considerations • Team node acts as root node • Allows maintaing separate graphs for different organisations
  • 12. Graph Design and Considerations • Top posts, notable messages are both message nodes • Differentiated using edge labels • Edge traversals favoured over property lookup
  • 13. Graph Design and Considerations • Any user can comment on, react to or be mentioned in any message • Reaction type modelled as edge property • Efficient as use-case does not need filtering by reaction type
  • 14. Graph Design and Considerations • Same file shared across channels shares common pool of reactions • Schema respects Slack specific behaviour • Handles idempotency based on unique ID maintained by Slack
  • 15. Graph Design and Considerations
  • 16. { "type": "message", "user": "U2FQG2G9F", "text": "next time you want cereal: n<https://www.instagram.com/p/BcDN4eWFjac/?taken- by=therock>", "attachments": [ { "service_name": "Instagram", "title": "Instagram post by @therock • Nov 28, 2017 at 7:14pm UTC", "title_link": "https://www.instagram.com/p/BcDN4eWFjac/?taken-by=therock", "text": "346.3k Likes, 2,167 Comments - @therock on Instagram:”……”", "fallback": "Instagram: Instagram post by @therock • Nov 28, 2017 at 7:14pm UTC", "image_url": "https://scontent-iad3-1.cdninstagram.com/t51.2885-15/e35/24178_n.jpg", "from_url": "https://www.instagram.com/p/BcDN4eWFjac/?taken-by=therock", "image_width": 334, "image_height": 250, "image_bytes": 178559, "service_icon": "https://www.instagram.com/static/images/ico/appl.png/932e4d9af891.png", "id": 1 } ], "thread_ts": "1511936426.000178", "reply_count": 3, "replies": [ { "user": "U193XDML7", "ts": "1511953167.000138” }, { "user": "U2FQG2G9F", "ts": "1511953180.000044" }, { "user": "U193XDML7", "ts": "1511953192.000230” } ], "ts": "1511936426.000178", "reactions": [ { "name": "smile", "users": [ "U193XDML7” ], "count": 1 }, { "name": "obesecat", "users": [ "U193XDML7” ], "count": 1 } ] } The Slack APIs Endpoint: https://slack.com/api/conversations.history
  • 17. Endpoint: https://slack.com/api/conversations.history [ { "type": "message", "user": "U4BPQR94L", "text": "Yinghui Malmsteen <@U2FQG2G9F>n<https://www.youtube.com/watch?v=D4OxW_0qqv8>", "attachments": [ { ... } ], "ts": "1536057373.000100", "reactions": [ { "name": "flag-se", "users": [ "U58LYK8Q6" ], "count": 1 } ] } ] [ { "user": "U2Q2U37SA", "inviter": "U0LPSJQR0", "text": "<@U2Q2U57SA> has joined the channel", "type": "message", "subtype": "channel_join", "ts": "1536138265.000200” } ] The Slack APIs
  • 18. [ { "id": "U4C0FDU2J", "team_id": "T028ZLMQN", "name": "friendlybotdev", "deleted": true, "profile": { "title": "", "phone": "", "skype": "", "real_name": "Friendly Bot", "real_name_normalized": "Friendly Bot", "display_name": "friendlybotdev", "display_name_normalized": "friendlybotdev", "status_text": "", "status_emoji": "", "status_expiration": 0, "avatar_hash": "123456", "bot_id": "B4B47T0G3", "api_app_id": "A4B92ZEER", "always_active": true, "image_original": "https://slack-edge.com/2017-06-21/123456_original.png", "first_name": "Friendly", "last_name": "Bot", "image_24": "https://slack-edge.com/2017-06-21/123456_24.png", "image_32": "https://slack-edge.com/2017-06-21/123456_32.png", "image_48": "https://slack-edge.com/2017-06-21/123456_48.png", "image_72": "https://slack-edge.com/2017-06-21/123456_72.png", "image_192": "https://slack-edge.com/2017-06-21/123456_192.png", "image_512": "https://slack-edge.com/2017-06-21/123456_512.png", "image_1024": "https://slack-edge.com/2017-06-21/123456_1024.png", "status_text_canonical": "", "team": "T028Z5MQN" }, "is_bot": true, "is_app_user": false, "updated": 1517305013 } ] [ { "id": "C8KMHCN5D", "name": "arandomchannel", "is_channel": true, "created": 1507613685, "creator": "U5BG5XU6T", "is_shared": false, "is_member": true, "is_private": false, "last_read": "1533892238.000324", "latest": { "type": "message", "user": "U84K3ZTF9", "text": "let's meetup tomorrow", "ts": "1536139470.000100" }, "unread_count": 7, "unread_count_display": 7, "members": [ "U08ED90CD", "U0LPSJQR0", "U193XDML7", "U9LKWV9C1", "UBJ4CHV5L" ], "topic": { "value": "place for people who are interested in sharing and learning", "creator": "U5BGLXU6T", "last_set": 1507613720 }, "purpose": { "value": "", "creator": "", "last_set": 0 }, "previous_names": [] } ] Endpoint: https://slack.com/api/users.list Endpoint: https://slack.com/api/channels.info The Slack APIs
  • 19. The Journey So Far • Defining entities and modelling them into Graph • Iterative feedback-drivenprocess • Understanding the data available from the API • Identifying unique IDs • Filtering out required fields
  • 20. Data Ingestion and Extraction • Apache Flink cluster retrieves, parses and filters Slack data • GraphQL service requests data for visualization • Flask REST service ingests/queries data to/from Tinkerpop POST PUT GET Gremlin-Python Gremlin Bytecode
  • 21. Why Tinkerpop? • Abstraction that lets us avoid vendor lock-in • Reduces rework when switching data stores • Gremlin query language • Hadoop and SparkComputer
  • 22. Making the Graph RESTful • Defining REST Endpoints • Defining the Resources • Remote Traversals
  • 23. • Write endpoints for seeding • POST /teams/<team_uid>/channels • POST /teams/<team_uid>/channels/<channel_uid>/messages • Handling Idempotency • Replace default strategy with ”ElementIDStrategy” • Enables creation of nodes with Slack specific unique IDs Defining REST Endpoints // scripts/empty-sample.groovy globals << [g : graph.traversal(),sg: graph.traversal().withStrategies(ElementIdStrategy.build().create())] • Read endpoints for queries • GET /teams/<team_uid>/top_posts
  • 24. Making the Graph RESTful • Setting up REST Endpoints • Defining the Resources • Remote Traversals
  • 25. Defining the Resources from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema from marshmallow.exceptions import ValidationError ... class MessageSchema(Schema): """ Holds all the required fields for a message object.""" ts = fields.Float(required=True) text = fields.Str() comment = fields.Str() subtype = fields.Str() bot_id = fields.Str(validate=is_bot_uid) user = fields.Str(validate=is_user_uid) thread_ts = fields.Str() file_share = fields.Nested(FileShareSchema, load_from="file") attachments = fields.Nested(AttachmentSchema, many=True) reactions = fields.Nested(ReactionSchema, many=True) comments = fields.Nested(CommentSchema, many=True, load_from="replies") mentions = fields.List(fields.Str(validate=is_user_uid)) class AttachmentSchema(Schema): """ Holds all the required fields for an Attachment object.""" class ReactionSchema(Schema): """ Holds all the required fields for a reaction object.""" class CommentSchema(Schema): """ Holds all the required fields for a comment object.""" ... • Organized code with single point of reference
  • 26. Defining the Resources from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema from marshmallow.exceptions import ValidationError ... class MessageSchema(Schema): """ Holds all the required fields for a message object.""" @validates_schema def validate_message(self, data): """ Validate if the message contains any of comments, mentions or reactions. """ if not any([f(data) for f in (has_comments, has_mentions, has_reactions)]): raise ValidationError("The message must contain comments, mentions or reactions") ts = fields.Float(required=True) text = fields.Str() comment = fields.Str() subtype = fields.Str() bot_id = fields.Str(validate=is_bot_uid) user = fields.Str(validate=is_user_uid) thread_ts = fields.Str() file_share = fields.Nested(FileShareSchema, load_from="file") attachments = fields.Nested(AttachmentSchema, many=True) reactions = fields.Nested(ReactionSchema, many=True) comments = fields.Nested(CommentSchema, many=True, load_from="replies") mentions = fields.List(fields.Str(validate=is_user_uid)) class AttachmentSchema(Schema): """ Holds all the required fields for an Attachment object.""" class ReactionSchema(Schema): """ Holds all the required fields for a reaction object.""" class CommentSchema(Schema): """ Holds all the required fields for a comment object.""" ... • Organized code with single point of reference • Validate data before ingestion • Enforce types and required fields @validates_schema def validate_message(self, data): """ Validate if the message contains any of comments, mentions or reactions. """ if not any([f(data) for f in (has_comments, has_mentions, has_reactions)]): raise ValidationError("The message must contain comments, mentions or reactions")
  • 27. from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema from marshmallow.exceptions import ValidationError ... class MessageSchema(Schema): """ Holds all the required fields for a message object.""" class AttachmentSchema(Schema): """ Holds all the required fields for an Attachment object.""" title = fields.Str() fallback = fields.Str() text = fields.Str() thumb_url = fields.Str() image_url = fields.Str() title_link = fields.Str() @post_load def reshape_attachment(self, data): """ Apply required transformations on the Attachment object. ""” # Create a post_title field collapse_keys(data, "post_title", *("fallback", "title", "text")) # Create a post_thumbnail field collapse_keys(data, "post_thumbnail", *("thumb_url", "image_url", "title_link")) # Set post_type to URL data["post_type"] = "URL” class ReactionSchema(Schema): """ Holds all the required fields for a reaction object.""" class CommentSchema(Schema): """ Holds all the required fields for a comment object.""" class FileShareSchema(Schema): """ Holds all the required fields for a File Share object.""” class UserSchema(Schema): """ Holds all the required fields for a User object.""” ... • Organized code with single point of reference • Validate data before ingestion • Enforce types and required fields • Normalize fields with post- processing Defining the Resources @post_load def reshape_attachment(self, data): """ Apply required transformations on the Attachment object. ""” # Create a post_title field collapse_keys(data, "post_title", *("fallback", "title", "text")) # Create a post_thumbnail field collapse_keys(data, "post_thumbnail", *("thumb_url", "image_url", "title_link")) # Set post_type to URL data["post_type"] = "URL”
  • 28. Making the Graph RESTful • Schema enforcement and validation • Handling Idempotency of endpoints • Custom Traversal Source
  • 29. Remote Traversals • Bytecode sent over network instead of string • Allows using custom traversal source for a Domain Specific Language (DSL) from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection ... conn = DriverRemoteConnection(GREMLIN_SERVER_HOST, 'sg') slack = Graph().traversal(SlackTraversalSource).withRemote(conn)
  • 30. Building a DSL • Motivations • Custom Workflows
  • 31. Building a DSL - Motivations class SlackTraversalSource(BaseTraversalSource): """ Module to initialise a Graph with the methods listed under SlackTraversal. """ def __init__(self, *args, **kwargs): super(SlackTraversalSource, self).__init__(*args, **kwargs) self.graph_traversal = SlackTraversal def channels(self, *channel_ids): """ Shorthand to identify all channel nodes""" return traversal • Custom traversal source can also specify useful shorthands • E.g. Traversing to all the Channel nodes traversal = self.get_graph_traversal() traversal.bytecode.add_step("V") traversal.bytecode.add_step("hasLabel", NODES.channel) if channel_ids: traversal.bytecode.add_step("has", "__id", P.within(channel_ids))
  • 32. Building a DSL - Motivations class SlackTraversal(BaseTraversal): def addPartOfChannelEdges(self, channel_uid, *user_uids, **kwargs): """ Add an edge to a channel from the users who were/are a part of the channel. ""” return self • Custom traversal source specifies business logic behind traversals • E.g. Connecting a User node to a Channel node for user_uid in user_uids: edge_uid = construct_uid(user_uid, channel_uid, EDGES.part_of.name, delim="|") self.getOrAddEdgeFrom(edge_label=EDGES.part_of, edge_uid=edge_uid, node_label=NODES.user, node_uid=user_uid) .upsertProperties(kwargs.get("properties")).inV()
  • 33. Building a DSL - Motivations from gremlin_python.process.graph_traversal import GraphTraversal from gremlin_python.process.graph_traversal import GraphTraversalSource, __ class BaseTraversal(GraphTraversal): def getOrAddEdgeFrom(self, edge_label, edge_uid, node_label, node_uid): """ Adds an edge from the node with the given label and uid only if the edge doesn’t exist. """ return self.coalesce( __.addE(edge_label).property(T.id, edge_uid).from_( __.V().getNode(node_label, node_uid))) __.InE(edge_label).hasId(edge_uid).and( __.outV().hasId(node_uid), __.outV().hasLabel(node_label)), • BaseTraversal handles creation of nodes and edges • These methods should guarantee idempotency • E.g. Creation of edges between two nodes… • ...checks for an existing edge
  • 34. Building a DSL - Motivations from gremlin_python.process.graph_traversal import GraphTraversal from gremlin_python.process.graph_traversal import GraphTraversalSource, __ class BaseTraversal(GraphTraversal): def getOrAddEdgeFrom(self, edge_label, edge_uid, node_label, node_uid): """ Adds an edge from the node with the given label and uid only if the edge doesn’t exist. """ return self.coalesce( __.InE(edge_label).hasId(edge_uid).and( __.outV().hasId(node_uid), __.outV().hasLabel(node_label)), __.addE(edge_label).property(T.id, edge_uid).from_( __.V().getNode(node_label, node_uid))) • The edge is created only if it doesn’t already exist
  • 35. def build_visualization(self, traversal_source, **kwargs): """ The below are standardized steps that are required to generate data for any visualization.""" return self.start(traversal_source) .filterByDate(self.date_dimension, kwargs.get("start_time"), kwargs.get("end_time")) .filterByFields(self.filters_map, kwargs.get("filters")) .sortByFields(self.sorting_map, kwargs.get("sort_field"), kwargs.get("sort_direction")) .buildObject(self.object_map).toList() Building a DSL – Custom Workflows • Standardized steps for generating a visualization are defined in the BaseTraversal • Custom maps define traversal paths for fields that vary across visualizations
  • 36. Building a DSL – Custom Workflows # Sample filter from frontend filter_obj = {'_and': [{"field": 'reactions', '_gte': 100}, {"field": 'post_creator', '_in': [‘bob’, ‘chloe'] }]} filter_map = {"post_creator": lambda pred: __.in_(EDGES.created_post).has(USER.display_name, pred), "reactions": lambda pred: __.inE(EDGES.reacted_to).count().is_(pred) } object_map = { "post_creator": {"uid": [__.in_(EDGES.created_post).values("__id"), __.constant("")], "image": ... # define similar path here, }, "reactions": __.inE(EDGES.reacted_to).groupCount().by(__.values(REACTION.name)) } start = lambda traversal_source: traversal_source.posts() # DSL generates the required lower level base traversals slack.posts().where( __.and_( __.inE(EDGES.reacted_to).count().is_(P.gte(100)), __.in_(EDGES.created_post).has(USER.display_name, P.within(['bob', 'chloe'])))). project("post_creator", "reactions").by( __.project("image", "display_name", "uid").by( __.in(EDGES.created_post).values(USER.image), __.in(EDGES.created_post).values(USER.display_name), __.in(EDGES.created_post).values("__id"))).by( __.inE(EDGES.reacted_to).groupCount()).toList() # Inject maps into DSL methods start(slack) .filterByFields(self.filters_map, kwargs.get("filters")) .buildObject(self.object_map) .toList() • The DSL takes in functions/paths that map fields to their traversals • Maps customized based on the visualization that is needed
  • 37. Building a DSL – Custom Workflows { "reactions": { "palm_tree": 82, "robot_face": 18 }, "post_creator": { "image": "https://url_of_image.jpg", "display_name": ”chloe", "uid": "U024ZH7HL” } } • The traversals generated churn out the final response objects • Objects rendered into visualizations by the client
  • 38. Testing the Application • Unit Tests • Validating traversals on Gremlin Server
  • 39. Check if test passes Use Fixtures Write code to make the test pass Write a failing test class TestNodeMethods(object): """ Test methods that help in retrieval and creation of Nodes. """ def test_node_retrieval(self, graph): """ Test if getNode retrieves an existing node. """ assert graph.V().getNode(label="person", uid=100) .count().next() == 1 assert graph.V().getNode(label="person", uid=101) .count().next() == 1 Start Gremlin Server Testing Our Application – Unit Testing
  • 40. Check if test passes Use Fixtures Write code to make the test pass Write a failing test Start Gremlin Server def getNode(self, label, uid): """ Returns the node with the given label and uid. Args: label (string): The label of the node to return uid (string): Unique ID of the node Raises: StopIteration: Node with the given label and uid does not exist """ return self.and_(__.hasLabel(label), __.has(T.id, uid)) Testing Our Application – Unit Testing
  • 41. Check if test passes Use Fixtures Write code to make the test pass Write a failing test Start Gremlin Server $ bin/gremlin-server.sh conf/gremlin-server-neo4j-python.yaml class TestBasicTraversal(object): """ Tests for methods that help create edges or nodes and methods that help populate the properties of these objects. """ @pytest.fixture(scope="module") def graph(self): """ Graph with two nodes and one edge connecting them. """ graph = Graph().traversal(CerebroTraversalSource) .withRemote( DriverRemoteConnection(GREMLIN_SERVER_HOST, GREMLIN_SERVER_TRAVERSER)) graph.V().clear() from_node = graph.addV("person"). property(T.id, 100).next() to_node = graph.addV("person"). property(T.id, 101).next() graph.addE("knows").from_(from_node).to(to_node) .property("__id", "1") .next() yield graph graph.V().clear() Testing Our Application – Unit Testing
  • 42. Check if test passes class TestNodeMethods(object): """ Test methods that help in retrieval and creation of Nodes. """ def test_node_retrieval(self, graph): """ Test if getNode retrieves an existing node. """ assert graph.V().getNode(label="person", uid=100) .count().next() == 1 assert graph.V().getNode(label="person", uid=101) .count().next() == 1 Write code to make the test pass Write a failing test Use Fixtures Start Gremlin Server Testing Our Application – Unit Testing
  • 43. [ { "reactions": [ { "name": "joy", "users": [ "U5K7JUATE” ] } ], "attachments": [ { ... } ], "text": "<https://www.youtube.com/watch?v=4iEh1ykb13w>", "ts": "1465895473.000050", "user": "U37BF9457", "type": "message” } ] Testing Our Application – Unit Testing class MessageSchema(Schema): """ Holds all the required fields for a message object.""" . . . • Fixture used to test if the MessageSchema class is implemented correctly
  • 44. [ { "reactions": [ { "name": "joy", "users": [ "U5K7JUATE” ] } ], "attachments": [ {...} ], "text": ” <@U123456> <https://www.youtube.com/watch?v=4iEh1ykb13w>", "mentions": [ "U123456” ], "ts": ”a "type": "message” } ] Testing Our Application – Unit Testing class MessageSchema(Schema): """ Holds all the required fields for a message object.""" mentions = fields.List(fields.Str(validate=is_user_uid)) • MessageSchema needs to include mentions • Update the fixture to be able to test that the schema includes mentions • Need to validate if traversals pick up mentions
  • 45. Write code to make the test pass Write a failing test [ { "reactions": [ { "name": "joy", "users": [ "U5K7JUATE” ] } ], "attachments": [ {...} ], "text": ” <@U123456> <https://www.youtube.com/watch?v=4iEh1ykb13w>", "mentions": [ "U123456” ], "ts": ”a "type": "message” } ] gremlin> graph.io(graphson()).writeGraph("graph_name.json") Testing Our Application – Unit Testing Update JSON & Generate GraphSON Check if test passes Use Fixtures Start Gremlin Server
  • 46. Write code to make the test pass Write a failing test @pytest.fixture(scope="module") def slack_graph(): """ Open a subgraph on localhost for testing. """ slack.V().clear() slack_client = Client(GREMLIN_SERVER_HOST, SLACK_TRAVERSER) path_to_fixture = str(Path.cwd().joinpath( "tests/fixtures/slack_graph.json")) graphson_statement = 'graph.io(graphson()).readGraph("{}")’. format(path_to_fixture) slack_client.submit(graphson_statement).all().result() yield slack slack.V().clear() Testing Our Application – Unit Testing Update JSON & Generate GraphSON Check if test passes Use Fixtures Start Gremlin Server
  • 47. Testing the Application – CI/CD • Automated tests using CircleCI • Custom Configuration for Gremlin Server • Caching Dependencies for Faster Tests
  • 48. steps: #CircleCI 2.0 ... - run: command: | if [ ! -d ./apache-tinkerpop-gremlin-server-3.3.3 ]; then curl -O https://archive.apache.org/dist/tinkerpop/3.3.3/apache-tinkerpop-gremlin-server- 3.3.3-bin.zip unzip -q apache-tinkerpop-gremlin-server-3.3.3-bin.zip # Install gremlin-python cd ./apache-tinkerpop-gremlin-server-3.3.3 && ./bin/gremlin-server.sh install org.apache.tinkerpop gremlin-python 3.3.3 # Change max content length and traversal strategy sed -i -- 's/.*maxContentLength:.*/maxContentLength: 2621440/g' conf/gremlin-server.yaml sed -i -- 's/graph.traversal()]/graph.traversal(),sg: graph.traversal().withStrategies(ElementIdStrategy.build().create())]/g' ./scripts/empty-sample.groovy fi ... Testing the Application – CI/CD
  • 49. Testing the Application – CI/CD steps: #CircleCI 2.0 - checkout - restore_cache: keys: - v1-dependencies-{{ .Branch }} - v1-dependencies-master - run: # Download and install Gremlin server ... # Cache the installation - save_cache: key: v1-dependencies-{{ .Branch }} paths: - ~/src/app_name/apache-tinkerpop-gremlin-server-3.3.3
  • 50. # Test - run: # Starting Gremlin Server command: | cd ./apache-tinkerpop-gremlin-server-3.3.3 && ./bin/gremlin-server.sh ./conf/gremlin-server.yaml background: true # Sleep to give the gremlin server enough time to start - run: sleep 10 - run: pycodestyle app_name - run: coverage run --source=app_name -m pytest tests --capture=no --strict - run: coverage report -m --fail-under=95 Testing the Application – CI/CD
  • 51. Scaling Our Graph • Async Traversals • HA Cluster and Load Balancing
  • 52. def seed_channels(data, team_uid): for channel_data in data: channel_uid, creator, members = (channel_data.pop(key) for key in ["uid", "creator", "members"]) slack.V().addChannel(channel_uid, properties=channel_data).next() slack.teams(team_uid).addTeamHasChannelEdge(team_uid, channel_uid).next() slack.users(creator).addCreatedChannelEdge(creator, channel_uid).next() slack.channels(channel_uid).addPartOfChannelEdges(channel_uid, *members).next() def seed_channels(data, team_uid): for channel_data in data: channel_uid, creator, members = (channel_data.pop(key) for key in ["uid", "creator", "members"]) slack.V().addChannel(channel_uid, properties=channel_data) .addTeamHasChannelEdge(team_uid, channel_uid).inV() .addCreatedChannelEdge(creator, channel_uid).inV() .addPartOfChannelEdges(channel_uid, *members).next() def seed_channels(data, team_uid): for channel_data in data: channel_uid, creator, members = (channel_data.pop(key) for key in ["uid", "creator", "members"]) slack.V().addChannel(channel_uid, properties=channel_data) .addTeamHasChannelEdge(team_uid, channel_uid).inV() .addCreatedChannelEdge(creator, channel_uid).inV() .addPartOfChannelEdges(channel_uid, *members).promise() • Seed subgraph using “next” • Reduce number of blocking calls to one per channel • Seed subgraph using “promise” • Make seeding asynchronous, no blocking calls • Verify that the returned futures were successful • Seed individual entities using “next” • Each call to “next” is blocking Async Traversals next() next() next() next() next() promise()
  • 53. HA Cluster and Load Balancing • Preparing for high availability with Neo4J and Gremlin • Configuring Gremlin Server and Neo4J • Understanding the Neo4J HA Architecture • Advantages • Data replication • Spread writes across instance • Handle greater read loads • HA cluster is fronted by a load balancer like HAProxy • Reference: • https://neo4j.com/docs/operations-manual/current/ha-cluster/architecture/ • http://tinkerpop.apache.org/docs/3.3.3/reference/#_high_availability_configuration
  • 54. HA Cluster and Load Balancing • Tuning parameters for the cluster • Frequency of pulling updates from other members of the cluster • gremlin.neo4j.conf.ha.pull_interval • Number of slaves a transaction should be committed to • gremlin.neo4j.conf.ha.tx_push_factor • Tuning parameters for the Load Balancer • Routing requests across the cluster • balance • Checking if the members in the cluster are responsive • option httpchk // gremlin-server-neo4j-ha-{1..3}.yaml channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer > curl "http://localhost:8182?gremlin=100-1"