Guide to SQL to NoSQL migration

GUIDETO 
SQL - NOSQL MIGRATION
AntonYazovskiy

Solution Architect,ThumbtackTechnology

AGENDA
• Why would you want to migrate to NoSQL

• Conceptual difference between RBDMS and
NoSQL

• Data modeling and architectural best practices

• Practical migration steps / questions you have to ask

WHY?
scalability

performance

developer productivity

CONCEPTUAL DIFFERENCE
BETWEEN RBDMS AND NOSQL
• relational schema allows you to query data in many different ways in different contexts

• accessible for many types of applications and separate dev teams

• schema helps to control rules common for everybody

!
• always remember that in most cases you run queries across the cluster

• NoSQL is about focusing on particular need and goal

• model your data for specific use case

• define what are you willing to sacrifice to achieve better results

DATA MODELING AND
ARCHITECTURAL BEST
PRACTICES

POLYGLOT PERSISTENCE
• different solutions are designed to solve different problems

• session & fast transactions

• cache

• aggregations

• analytical ad-hoc queries

• graph traversal

• the requirements for OLTP and OLAP storages are very different

NOSQL DATA STRUCTURES
• Key-Value: Riak, Redis, MemcacheDB,Aerospike
and Amazon DynamoDB (Cloud).

• Key-Document: MongoDB and Couchbase.

• Column-Family: Cassandra, HBase

• Graph Databases - Neo4j and OrientDB.

PRACTICAL
MIGRATION
STEPS
• what would you like to achieve

• learn your trafﬁc

• lean your data set

• what are you willing to sacriﬁce

• apply polyglot persistence

• model your data

• synchronization

WHAT WOULDYOU LIKETO
ACHIEVE
• better performance

• scale current solution

• process more or(and) different data

• speed-up the development

• I heard of it

LEARNYOURTRAFFIC
• how workload looks like:

• OLTP (simple lookups, short transactions)

• OLAP (aggregations, analytical queries, ad-hock scans, etc.)

• heavy-read, heavy-write

• what kind of queries do you perform in order to address application's
questions:

• simple lookups, uncertain search, inner requests, traversal, BI/Analysis

LEANYOUR DATA SET
• what kind of data types do you operate with

• simple key-value

• structure, semi-structure

• nested/hierarchical

• graph-oriented

• what size of each data type do you have

WHAT AREYOU WILLINGTO
SACRIFICE
• what data doesn't require a strong consistency

• where transactional guarantees aren't require

• what data are you willing to lost in case of
hardware failure

• where are you willing to sacriﬁce joins

APPLY POLYGLOT
PERSISTENCE
• Based on discovered answers, deﬁne the most obvious types of storages that
you may need

• fast & simple storage for lookups, non-critical data and short transactions

• RDBMS for data that ﬁt into single server

• document-oriented storage for inner/hierarchical data and aggregate-
oriented reads & writes

• graph-oriented storage for traversal queries, social relations, etc.

• highly-scalable storage for BigData background processing

DATA MODELING: BEFORE
YOU START
• from “what data do I have”to “what questions do I
have”

• denormalization & duplication are your best
friends

• hierarchical and embedded structures make your
life easier, but they are your worst enemy

REFERENCES
• in-application joins

• nothing to be
ashamed about

• apply carefully
!
{
user_name: ayazovskiy,
contact: {..},
access: {
level: 523,
group: dev
}
}
{
access_level: 523,
rules: [...]
}

DUPLICATION
• Duplication is a technique of copying pieces of data between
structures in order to either optimize query processing time or
convert data into particular business model.

!
• The main advantages of denormalization is ability to:

1. reduce the number of I/O operations and query time

2. reduce complexity of query processing in distributed systems

AGGREGATES
• simplify data processing logic

• optimize read/write time

• ability to distribute the data
across the cluster

• reduce # of requests across
the cluster

• perform atomic updates
{
contact: {
phone: 123,
email: @thumbtack.net
},
access: {
level: 5,
group: dev
}
}

AGGREGATES
• updates of duplicated
data are heavy and
complex

• querying across
aggregates heavy and
complex
{
contact: {
phone: 123,
email: @thumbtack.net
},
access: {
level: 5,
group: dev
}
}

COUNTERS
• NoSQL auto-increment analog

• distributed consistent auto-increment is tricky

• counters aren't always reliable *

COMPOSITE KEYS
{
"ID": "chat#user_1#user_2#december_12_2014",
"messages": [
{ "user_1": "hey" },
{ "user_1": "how is going?" },
{ "user_2": "thanks, pretty well!" }
]
}

APPEND
{
ID: account#User_A,
account_total: $100,
account_total_calculation_time: ..,
changes_since_last_calculation: [
1399493200: +$10,
1399892139: -$25
]
}

THINK OF DATA
SYNCHRONIZATION
• application-level synchronization:

• e.g. update user profile in document-oriented storage, it's social network in graph storage, and
session in key-value cache

• regular synchronization:

• this may be a hourly/daily/weekly process that takes updated data and propagates across the
system

• incremental background synchronization

• solutions likeTungsten synchronizer allows you to track changes in RDBS via transactional log, and
apply these changes immediately to NoSQL storage

• e.g. user profiles in MySQL synchronized with Aerospike via property configuredTungsten
Replicator

–AntonYazovskiy
“always remember that in most cases you run queries
across the cluster”

Any questions?
Thank you
@yazovsky

ayazovksiy@thumbtack.net

www.thumbtack.net

THANKS / REFERENCES
• NoSQL Distilled:A Brief Guide to the Emerging World of Polyglot
Persistence by Pramod J. Sadalage and Martin Fowler

• NoSQL Data ModelingTechniques

(http://highlyscalable.wordpress.com)

• MongoDB documentation (http://docs.mongodb.org)

• Couchbase documentation (http://docs.couchbase.com)

• FoundationDB Blog (http://blog.foundationdb.com)

Guide to SQL to NoSQL migration

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

Guide to SQL to NoSQL migration