SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
GUIDETO

SQL - NOSQL MIGRATION
AntonYazovskiy	

Solution Architect,ThumbtackTechnology
AGENDA
• Why would you want to migrate to NoSQL	

• Conceptual difference between RBDMS and
NoSQL	

• Data modeling and architectural best practices	

• Practical migration steps / questions you have to ask
WHY?
scalability	

performance	

developer productivity
CONCEPTUAL DIFFERENCE
BETWEEN RBDMS AND NOSQL
• relational schema allows you to query data in many different ways in different contexts	

• accessible for many types of applications and separate dev teams	

• schema helps to control rules common for everybody	

!
• always remember that in most cases you run queries across the cluster	

• NoSQL is about focusing on particular need and goal	

• model your data for specific use case	

• define what are you willing to sacrifice to achieve better results
DATA MODELING AND
ARCHITECTURAL BEST
PRACTICES
POLYGLOT PERSISTENCE
• different solutions are designed to solve different problems	

• session & fast transactions	

• cache	

• aggregations	

• analytical ad-hoc queries	

• graph traversal	

• the requirements for OLTP and OLAP storages are very different
POLYGLOT PERSISTENCE
NOSQL DATA STRUCTURES
• Key-Value: Riak, Redis, MemcacheDB,Aerospike
and Amazon DynamoDB (Cloud).	

• Key-Document: MongoDB and Couchbase.	

• Column-Family: Cassandra, HBase	

• Graph Databases - Neo4j and OrientDB.
PRACTICAL
MIGRATION
STEPS
• what would you like to achieve	

• learn your traffic	

• lean your data set	

• what are you willing to sacrifice	

• apply polyglot persistence	

• model your data	

• synchronization
WHAT WOULDYOU LIKETO
ACHIEVE
• better performance	

• scale current solution	

• process more or(and) different data	

• speed-up the development	

• I heard of it
LEARNYOURTRAFFIC
• how workload looks like:	

• OLTP (simple lookups, short transactions)	

• OLAP (aggregations, analytical queries, ad-hock scans, etc.)	

• heavy-read, heavy-write	

• what kind of queries do you perform in order to address application's
questions:	

• simple lookups, uncertain search, inner requests, traversal, BI/Analysis
LEANYOUR DATA SET
• what kind of data types do you operate with	

• simple key-value	

• structure, semi-structure	

• nested/hierarchical	

• graph-oriented	

• what size of each data type do you have
WHAT AREYOU WILLINGTO
SACRIFICE
• what data doesn't require a strong consistency	

• where transactional guarantees aren't require	

• what data are you willing to lost in case of
hardware failure	

• where are you willing to sacrifice joins
APPLY POLYGLOT
PERSISTENCE
• Based on discovered answers, define the most obvious types of storages that
you may need	

• fast & simple storage for lookups, non-critical data and short transactions	

• RDBMS for data that fit into single server	

• document-oriented storage for inner/hierarchical data and aggregate-
oriented reads & writes	

• graph-oriented storage for traversal queries, social relations, etc.	

• highly-scalable storage for BigData background processing
DEFINE A DATA MODEL
DATA MODELING: BEFORE
YOU START
• from “what data do I have”to “what questions do I
have”	

• denormalization & duplication are your best
friends	

• hierarchical and embedded structures make your
life easier, but they are your worst enemy
REFERENCES
• in-application joins	

• nothing to be
ashamed about	

• apply carefully
!
{
user_name: ayazovskiy,
contact: {..},
access: {
level: 523,
group: dev
}
}
{
access_level: 523,
rules: [...]
}
DUPLICATION
• Duplication is a technique of copying pieces of data between
structures in order to either optimize query processing time or
convert data into particular business model.	

!
• The main advantages of denormalization is ability to:	

1. reduce the number of I/O operations and query time	

2. reduce complexity of query processing in distributed systems
AGGREGATES
• simplify data processing logic	

• optimize read/write time	

• ability to distribute the data
across the cluster	

• reduce # of requests across
the cluster	

• perform atomic updates
{
user_name: ayazovskiy,
contact: {
phone: 123,
email: @thumbtack.net
},
access: {
level: 5,
group: dev
}
}
AGGREGATES
• updates of duplicated
data are heavy and
complex	

• querying across
aggregates heavy and
complex
{
user_name: ayazovskiy,
contact: {
phone: 123,
email: @thumbtack.net
},
access: {
level: 5,
group: dev
}
}
COUNTERS
• NoSQL auto-increment analog	

• distributed consistent auto-increment is tricky	

• counters aren't always reliable *
COMPOSITE KEYS
{
"ID": "chat#user_1#user_2#december_12_2014",
"messages": [
{ "user_1": "hey" },
{ "user_1": "how is going?" },
{ "user_2": "thanks, pretty well!" }
]
}
APPEND
{
ID: account#User_A,
account_total: $100,
account_total_calculation_time: ..,
changes_since_last_calculation: [
1399493200: +$10,
1399892139: -$25
]
}
THINK OF DATA
SYNCHRONIZATION
• application-level synchronization:	

• e.g. update user profile in document-oriented storage, it's social network in graph storage, and
session in key-value cache	

• regular synchronization:	

• this may be a hourly/daily/weekly process that takes updated data and propagates across the
system	

• incremental background synchronization	

• solutions likeTungsten synchronizer allows you to track changes in RDBS via transactional log, and
apply these changes immediately to NoSQL storage	

• e.g. user profiles in MySQL synchronized with Aerospike via property configuredTungsten
Replicator
–AntonYazovskiy
“always remember that in most cases you run queries
across the cluster”
Any questions?
Thank you
@yazovsky	

ayazovksiy@thumbtack.net	

www.thumbtack.net
THANKS / REFERENCES
• NoSQL Distilled:A Brief Guide to the Emerging World of Polyglot
Persistence by Pramod J. Sadalage and Martin Fowler	

• NoSQL Data ModelingTechniques	

(http://highlyscalable.wordpress.com)	

• MongoDB documentation (http://docs.mongodb.org)	

• Couchbase documentation (http://docs.couchbase.com)	

• FoundationDB Blog (http://blog.foundationdb.com)

Contenu connexe

Dernier

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Dernier (20)

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 

En vedette

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

En vedette (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Guide to SQL to NoSQL migration

  • 1. GUIDETO
 SQL - NOSQL MIGRATION AntonYazovskiy Solution Architect,ThumbtackTechnology
  • 2. AGENDA • Why would you want to migrate to NoSQL • Conceptual difference between RBDMS and NoSQL • Data modeling and architectural best practices • Practical migration steps / questions you have to ask
  • 4. CONCEPTUAL DIFFERENCE BETWEEN RBDMS AND NOSQL • relational schema allows you to query data in many different ways in different contexts • accessible for many types of applications and separate dev teams • schema helps to control rules common for everybody ! • always remember that in most cases you run queries across the cluster • NoSQL is about focusing on particular need and goal • model your data for specific use case • define what are you willing to sacrifice to achieve better results
  • 6. POLYGLOT PERSISTENCE • different solutions are designed to solve different problems • session & fast transactions • cache • aggregations • analytical ad-hoc queries • graph traversal • the requirements for OLTP and OLAP storages are very different
  • 8. NOSQL DATA STRUCTURES • Key-Value: Riak, Redis, MemcacheDB,Aerospike and Amazon DynamoDB (Cloud). • Key-Document: MongoDB and Couchbase. • Column-Family: Cassandra, HBase • Graph Databases - Neo4j and OrientDB.
  • 9. PRACTICAL MIGRATION STEPS • what would you like to achieve • learn your traffic • lean your data set • what are you willing to sacrifice • apply polyglot persistence • model your data • synchronization
  • 10. WHAT WOULDYOU LIKETO ACHIEVE • better performance • scale current solution • process more or(and) different data • speed-up the development • I heard of it
  • 11. LEARNYOURTRAFFIC • how workload looks like: • OLTP (simple lookups, short transactions) • OLAP (aggregations, analytical queries, ad-hock scans, etc.) • heavy-read, heavy-write • what kind of queries do you perform in order to address application's questions: • simple lookups, uncertain search, inner requests, traversal, BI/Analysis
  • 12. LEANYOUR DATA SET • what kind of data types do you operate with • simple key-value • structure, semi-structure • nested/hierarchical • graph-oriented • what size of each data type do you have
  • 13. WHAT AREYOU WILLINGTO SACRIFICE • what data doesn't require a strong consistency • where transactional guarantees aren't require • what data are you willing to lost in case of hardware failure • where are you willing to sacrifice joins
  • 14. APPLY POLYGLOT PERSISTENCE • Based on discovered answers, define the most obvious types of storages that you may need • fast & simple storage for lookups, non-critical data and short transactions • RDBMS for data that fit into single server • document-oriented storage for inner/hierarchical data and aggregate- oriented reads & writes • graph-oriented storage for traversal queries, social relations, etc. • highly-scalable storage for BigData background processing
  • 15. DEFINE A DATA MODEL
  • 16. DATA MODELING: BEFORE YOU START • from “what data do I have”to “what questions do I have” • denormalization & duplication are your best friends • hierarchical and embedded structures make your life easier, but they are your worst enemy
  • 17. REFERENCES • in-application joins • nothing to be ashamed about • apply carefully ! { user_name: ayazovskiy, contact: {..}, access: { level: 523, group: dev } } { access_level: 523, rules: [...] }
  • 18. DUPLICATION • Duplication is a technique of copying pieces of data between structures in order to either optimize query processing time or convert data into particular business model. ! • The main advantages of denormalization is ability to: 1. reduce the number of I/O operations and query time 2. reduce complexity of query processing in distributed systems
  • 19. AGGREGATES • simplify data processing logic • optimize read/write time • ability to distribute the data across the cluster • reduce # of requests across the cluster • perform atomic updates { user_name: ayazovskiy, contact: { phone: 123, email: @thumbtack.net }, access: { level: 5, group: dev } }
  • 20. AGGREGATES • updates of duplicated data are heavy and complex • querying across aggregates heavy and complex { user_name: ayazovskiy, contact: { phone: 123, email: @thumbtack.net }, access: { level: 5, group: dev } }
  • 21. COUNTERS • NoSQL auto-increment analog • distributed consistent auto-increment is tricky • counters aren't always reliable *
  • 22. COMPOSITE KEYS { "ID": "chat#user_1#user_2#december_12_2014", "messages": [ { "user_1": "hey" }, { "user_1": "how is going?" }, { "user_2": "thanks, pretty well!" } ] }
  • 23. APPEND { ID: account#User_A, account_total: $100, account_total_calculation_time: .., changes_since_last_calculation: [ 1399493200: +$10, 1399892139: -$25 ] }
  • 24. THINK OF DATA SYNCHRONIZATION • application-level synchronization: • e.g. update user profile in document-oriented storage, it's social network in graph storage, and session in key-value cache • regular synchronization: • this may be a hourly/daily/weekly process that takes updated data and propagates across the system • incremental background synchronization • solutions likeTungsten synchronizer allows you to track changes in RDBS via transactional log, and apply these changes immediately to NoSQL storage • e.g. user profiles in MySQL synchronized with Aerospike via property configuredTungsten Replicator
  • 25.
  • 26. –AntonYazovskiy “always remember that in most cases you run queries across the cluster”
  • 28. THANKS / REFERENCES • NoSQL Distilled:A Brief Guide to the Emerging World of Polyglot Persistence by Pramod J. Sadalage and Martin Fowler • NoSQL Data ModelingTechniques (http://highlyscalable.wordpress.com) • MongoDB documentation (http://docs.mongodb.org) • Couchbase documentation (http://docs.couchbase.com) • FoundationDB Blog (http://blog.foundationdb.com)