Scaling API-first – The story of a global engineering organization
B 4 gravty
1.
2. 1 What Is Gravty?
2 The Internals of Gravty
3 Fine-Tuning Gravty
4 Future Plans
3. 1 What Is Gravty?
2 The Internals of Gravty
3 Fine-Tuning Gravty
4 Future Plans
4. A Graph Database Is
“A graph database is a database that uses
graph structures for semantic queries with nodes,
edges and properties to represent and store data.” (Wikipedia)
Stores objects (vertices)
and relationships (edges)
Provides graph search
capabilities
6. Use Cases of a Graph Database
Facebook
Social Graph
Social networks
Google
PageRank
Ranking websites
Walmart
and eBay
Product recommendation
7. Need for a Large Graph Database System
Social GraphLINE Timeline
LINE Talk
Ranking
Recommendation
LINE Friends Shop
LINE News
Gravty
8. Need for a Large Graph Database System
Social GraphLINE Timeline
LINE Talk
Ranking
Recommendation
LINE Friends Shop
LINE News
Gravty
7 billion vertices
100 billion edges
200 billion indexes
5 billion writes a day
(create / update / delete)
9. Gravty Is
A scalable graph database to search
relational information efficiently
by searching through a large pool of data
using the graph search technique.
10. Requirements for Gravty
Easy to scale out
• To support
ever-increasing data
Easy to develop
• Add, modify, and remove
features as necessary
• Tailored to the LINE
development environment
• Not dependent on LINE-
specific components
Full control over everything!
Easy to use
• Graph query language
• REST API
11. 1 What Is Gravty?
2 The Internals of Gravty
3 Fine-Tuning Gravty
4 Future Plans
Technology Stack and Architecture
Data Model
12. Technology Stack and Architecture
Application
TinkerPop3 Gremlin-Console
TinkerPop3 Graph API
Graph Processing Layer
Storage Layer
MySQL
(config, meta)
HBaseKafka
Gravty
18. Flat-Wide vs Tall-Narrow
Flat-Wide Model
Brown edge edge edge edge edge edge
(1) Row scan
2 operations
(2) Column scan
[cony, moon, sally]
‘likes’ ‘friends’
19. Flat-Wide vs Tall-Narrow
Tall-Narrow Model (Gravty)
brown-friends-sally
(1) Row scan
1 operation
[cony, moon, sally]
brown-friends-moon
brown-friends-cony
• Can split by rows (region)
• Can isolate hotspot rows
• Can scan in parallel
24. Row keys that have
sequential orders may cause
RegionServers to suffer:
Hot-spotting problem with HBase RegionServer
EDGE TABLE
SrcVertexId Label TgtVertexId
u000001 1 u000002
u000001 1 u000003
u000002 1 u000001
u000003 1 u000001
u000004 2 u000009
• Heavy loads of writes or reads
• Inefficient region splitting
Avoiding Hot-Spotting
25. Solutions to the hot-spotting problem
- Pre-splitting regions
- Salting row keys with a hashed prefix
(Salting tables by Apache Phoenix)
But, there is a scan performance issue
with the LIMIT clause
SELECT * FROM index …
LIMIT 100;
Avoiding Hot-Spotting
26. Avoiding Hot-Spotting
Phoenix Salted Table
Scan 100 rows
Client side merge sort
Phoenix Client
Result
Scan 100 rows
Scan 100 rows
Scan 100 rows
Scan maximum 400 rows
31. Default Phoenix IndexCommitter
1. Phoenix
client UPSERT
INDEX 1
Phoenix
Coprocessor
Region Server
Primary Table
Phoenix
Coprocessor
Region Server
INDEX 2
Phoenix
Coprocessor
Region Server
PUT
PUT / DELETE
PUT / DELETE
2. Request HBase mutations
for indexes in parallel
RETURN
3. Phoenix client
returns
32. Gravty IndexCommitter
INDEX 1
Phoenix
Coprocessor
Region Server
Primary Table
Phoenix
Coprocessor
Region Server
INDEX 2
Phoenix
Coprocessor
Region Server
1.PUT
2. HBase mutations for INDEX 1, 2
4. Consume
3.RETURN
Kafka
Index
Consumer
5. PUT / DELETE
5. PUT / DELETE
34. Reentrant event
processing
Every row is versioned in
HBase (timestamp)
Logging failures
and replaying
failed requests
Time machine to
resume at
certain runtime
Resetting runtime offset
of Kafka consumers
Best-Effort Failover
Fail fast, fix later