Is your legacy database infrastructure struggling to meet the demand of customer Service Level Agreements? If you, like many companies, are discovering that your infrastructure is not robust enough to deal with the speed and scale required of today's Internet-scale applications, it may be time to consider a switch to NoSQL storage.
Changing storage systems can be a daunting process and, with all the buzz surrounding NoSQL, it can be difficult to know where to start. As a Solutions Architect at Thumbtack Technology, Anton Yazovskiy has helped many companies through the selection and deployment process of NoSQL technologies. In this webinar, Anton will explain the main advantages of NoSQL and common use cases in which the migration to NoSQL makes sense. You will learn key questions that you should ask before migration, as well as important differences in data modeling and architectural approaches. Finally, you will take a look at a typical application based on Relational Database Management System (RDBMS) and will migrate it to NoSQL step-by-step.
Key topics that will be covered:
> Why you would want to migrate to NoSQL
> Conceptual differences between RDBMS and NoSQL
> Data modeling and architectural best practices
> "I got it. But what exactly I need to do?" - Practical migration steps
ABOUT THE PRESENTER
Anton Yazovskiy is a Software Engineer at Thumbtack Technology, where he focuses on high-performance enterprise architecture. He has presented at a variety of IT conferences and “DevDays” on topics such as NoSQL and MarkLogic.
2. AGENDA
• Why would you want to migrate to NoSQL
• Conceptual difference between RBDMS and
NoSQL
• Data modeling and architectural best practices
• Practical migration steps / questions you have to ask
4. CONCEPTUAL DIFFERENCE
BETWEEN RBDMS AND NOSQL
• relational schema allows you to query data in many different ways in different contexts
• accessible for many types of applications and separate dev teams
• schema helps to control rules common for everybody
!
• always remember that in most cases you run queries across the cluster
• NoSQL is about focusing on particular need and goal
• model your data for specific use case
• define what are you willing to sacrifice to achieve better results
6. POLYGLOT PERSISTENCE
• different solutions are designed to solve different problems
• session & fast transactions
• cache
• aggregations
• analytical ad-hoc queries
• graph traversal
• the requirements for OLTP and OLAP storages are very different
8. NOSQL DATA STRUCTURES
• Key-Value: Riak, Redis, MemcacheDB,Aerospike
and Amazon DynamoDB (Cloud).
• Key-Document: MongoDB and Couchbase.
• Column-Family: Cassandra, HBase
• Graph Databases - Neo4j and OrientDB.
9. PRACTICAL
MIGRATION
STEPS
• what would you like to achieve
• learn your traffic
• lean your data set
• what are you willing to sacrifice
• apply polyglot persistence
• model your data
• synchronization
10. WHAT WOULDYOU LIKETO
ACHIEVE
• better performance
• scale current solution
• process more or(and) different data
• speed-up the development
• I heard of it
11. LEARNYOURTRAFFIC
• how workload looks like:
• OLTP (simple lookups, short transactions)
• OLAP (aggregations, analytical queries, ad-hock scans, etc.)
• heavy-read, heavy-write
• what kind of queries do you perform in order to address application's
questions:
• simple lookups, uncertain search, inner requests, traversal, BI/Analysis
12. LEANYOUR DATA SET
• what kind of data types do you operate with
• simple key-value
• structure, semi-structure
• nested/hierarchical
• graph-oriented
• what size of each data type do you have
13. WHAT AREYOU WILLINGTO
SACRIFICE
• what data doesn't require a strong consistency
• where transactional guarantees aren't require
• what data are you willing to lost in case of
hardware failure
• where are you willing to sacrifice joins
14. APPLY POLYGLOT
PERSISTENCE
• Based on discovered answers, define the most obvious types of storages that
you may need
• fast & simple storage for lookups, non-critical data and short transactions
• RDBMS for data that fit into single server
• document-oriented storage for inner/hierarchical data and aggregate-
oriented reads & writes
• graph-oriented storage for traversal queries, social relations, etc.
• highly-scalable storage for BigData background processing
16. DATA MODELING: BEFORE
YOU START
• from “what data do I have”to “what questions do I
have”
• denormalization & duplication are your best
friends
• hierarchical and embedded structures make your
life easier, but they are your worst enemy
17. REFERENCES
• in-application joins
• nothing to be
ashamed about
• apply carefully
!
{
user_name: ayazovskiy,
contact: {..},
access: {
level: 523,
group: dev
}
}
{
access_level: 523,
rules: [...]
}
18. DUPLICATION
• Duplication is a technique of copying pieces of data between
structures in order to either optimize query processing time or
convert data into particular business model.
!
• The main advantages of denormalization is ability to:
1. reduce the number of I/O operations and query time
2. reduce complexity of query processing in distributed systems
19. AGGREGATES
• simplify data processing logic
• optimize read/write time
• ability to distribute the data
across the cluster
• reduce # of requests across
the cluster
• perform atomic updates
{
user_name: ayazovskiy,
contact: {
phone: 123,
email: @thumbtack.net
},
access: {
level: 5,
group: dev
}
}
20. AGGREGATES
• updates of duplicated
data are heavy and
complex
• querying across
aggregates heavy and
complex
{
user_name: ayazovskiy,
contact: {
phone: 123,
email: @thumbtack.net
},
access: {
level: 5,
group: dev
}
}
21. COUNTERS
• NoSQL auto-increment analog
• distributed consistent auto-increment is tricky
• counters aren't always reliable *
24. THINK OF DATA
SYNCHRONIZATION
• application-level synchronization:
• e.g. update user profile in document-oriented storage, it's social network in graph storage, and
session in key-value cache
• regular synchronization:
• this may be a hourly/daily/weekly process that takes updated data and propagates across the
system
• incremental background synchronization
• solutions likeTungsten synchronizer allows you to track changes in RDBS via transactional log, and
apply these changes immediately to NoSQL storage
• e.g. user profiles in MySQL synchronized with Aerospike via property configuredTungsten
Replicator
28. THANKS / REFERENCES
• NoSQL Distilled:A Brief Guide to the Emerging World of Polyglot
Persistence by Pramod J. Sadalage and Martin Fowler
• NoSQL Data ModelingTechniques
(http://highlyscalable.wordpress.com)
• MongoDB documentation (http://docs.mongodb.org)
• Couchbase documentation (http://docs.couchbase.com)
• FoundationDB Blog (http://blog.foundationdb.com)