There are a number of NewSQL products now on market such as VoltDB and Progres-XL. These promise NoSQL performance and scalability but with ACID and relational concepts implemented with ANSI SQL.
This session will cover off why NoSQL came about, why it's had it's day and why NewSQL will become the backbone of the Enterprise for OLTP and Analytics.
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NewSQL - Deliverance from BASE and back to SQL and ACID
1. NewSQL - Deliverance from BASE and back to SQL and ACID
There are a number of NewSQL products now on market such as VoltDB and Progres-XL. These promise
NoSQL performance and scalability but with ACID and relational concepts implemented with ANSI SQL.
This session will cover off why NoSQL came about, why it's had it's day and why NewSQL will become the
backbone of the Enterprise for OLTP and Analytics.
Tony Rogerson, SQL Server MVP
tonyrogerson@torver.net
@tonyrogerson
http://dataidol.com/tonyrogerson
2. Who am I?
Freelance SQL Server professional and Data Specialist
Fellow BCS, MSc in BI, PGCert in Data Science
28 years of development and database experience, 22 of which SQL Server – starting out in 1986
with VSAM, System W, Application System, DB2 and Oracle crossing over to Client/Server and
SQL Server since 4.21a in 1993
Awarded SQL Server MVP yearly since 97
Founded UK SQL Server User Group back in ’99, founder member of DDD, SQL Bits, SQL Relay,
SQL Santa
Interested in commodity based distributed processing of Data (naturally!)
3. Agenda
NoSQL
◦ Why the need?
◦ What products are available?
Transactions
◦ BASE
◦ ACID
SQL
◦ What is today’s SQL capable of?
◦ SQL Server performance – NoSQL required?
NewSQL
◦ SQL -> NoSQL -> NewSQL (distributed form of where we started)
◦ Distributed Data and ACID
Discussion
5. Why the Need?
The year is 2001 and
◦ It’s that Big Data thing….
◦ Mainstream Relational Databases (that use SQL) are scale up
◦ More grunt required – buy a bigger box
◦ SAN based storage is ridiculously expensive and complicated, heavy TCO
Y2K + 1
◦ Developers twiddling their thumbs ;)
Web adoption accelerates
◦ Google, Yahoo, Amazon and the like are born
◦ MySQL does not scale – too inflexible
◦ Up front costs of kit for projects/business that may fail – need elasticity
http://www.tomshardware.co.uk/15-years-of-hard-drive-
history-uk,review-1908-7.html
6. Products Available
Varied – type of NoSQL database
◦ Graph
◦ Key-Value
◦ Column store/Column Family
◦ Document Store
◦ Object
◦ Relational but without SQL
You name it and there is a product to do it
8. ACID
Atomicity
◦ The bounds of the transaction – everything within those bounds is a single unit of work
◦ All or nothing
Consistency
◦ Data must reside in the correct Domain of values
◦ Deferrable to the end of the unit of work
Isolation
◦ Changes are Isolated from other users
◦ Other connections cannot update what you have updated/updating
◦ Multi-Value Concurrency Control (MVCC) – snapshots
◦ Locking
Durability
◦ In system failure your changes are still maintained – nothing is lost
9. BASE (Basically Available, Soft-state, Eventually
Consistent)
BASE is a Transactional modelish (at the global level, rather than individual transactions)
Specific to Distributed database model
Basically Available – all or some of the system is available
Node 1 Node 2 Node 3
10. BASE (Basically Available, Soft-state, Eventually
Consistent)
Soft-state
Eventually Consistent
System may change over time [as replica’s become up-to-date (consistent)]
Node 1 Node 2 Node 3
Insert value ‘A’
11. Eventual Consistency in SQL Server
Asynchronous Availability Groups/Database Mirroring
Replication
Eventual / Causal Consistency
◦ Eventual no good for order specific [and important] transactions
◦ Like Merge replication
◦ Causal: deliver messages in correct order [e.g. service broker]
◦ Like Transactional Replication
12. ACID - Distributed
2PC is clunky and doesn’t scale across many nodes
PAXOS – Consensus theory – scales better
Remove the need for distributed ACID altogether
2PC Transaction
Coordinator
Subordinate
INSERT Subordinate
Subordinate
All or nothing
13. Mixing BASE and ACID
ACID applied local data node
BASE remote
14. Relational
Sets
Tables with Rows x Columns
Relational Theory dictates the row/column intersection is an Atomic value i.e. contains only a
single value from the domain modelled for that column
Chris Date:
◦ Atomicity cannot really be defined as absolute in Normal Form
◦ a column can contain “relational values” i.e. another table
Normal Form – the process used to define the schema around the data being modelled
15. OldSQL roots
Built for disk storage
Built for single machine, scale-up
Mature SQL language (decades of research) over the Relational Model
SQL extensions to deal with unstructured data (freetext)
16. OldSQL today
ACI [no Durability]
In-Memory
Modified design to work with Flash
Still scale-up
17. SQL Server
Delayed / No-Durability in SQL Server 2014
In-Memory extensions
Entity Attribute Value design combined with ColumnStore
Sparse Columns / Column sets
DEMOS
19. Describe NewSQL
NewSQL = OldSQL + Transparent_Data_Distribution + ACID
Also – add in the knobs and whistles for new tech
◦ Flash
◦ RAM
◦ Processor cache improvements
◦ Better parallelisation across local processor cores
Basically -> Scale out with ACID
20. Latency in a Distributed environment
Server
1Gbit
ethernet
Server
Switch
Server
Server
Server
Server
SQL Server
FirstName Surname DOB
Query returns
20,000 rows
558KiBytes of data
Slowest Slower Fastest
(Data Travel)
21. Reduce Latency – Data Locality
Server SQL Server
1Gbit
Server ethernet
Switch
Server
Server
Server
Server
Server SQL Server
Server SQL Server
22. Distributed SQL with ACID
Server1 SQL Server
1Gbit
ethernet
Switch
Server2 SQL Server
• 2 Phase Commit using DTC
• High Latency
• All or nothing
BEGIN DISTRIBUTED TRAN
INSERT Server3.pres_NEWSQL.dbo.people( ….. )
INSERT Server2.pres_NEWSQL.dbo.people( ….. )
INSERT Server1.pres_NEWSQL.dbo.people( ….. )
COMMIT TRAN
Server2 SQL Server
23. Querying a Distributed Environment
Financial Trading – Global position of the book
TOP 10 customers
Not easy (at speed) in an OLTP setting
Network Switch
N1 N2 N3 N4
25. Partitioning
Chop big table up into “horizontal
partitions”
Partition key required (Mash, Modulo, Key
range)
Each partition is self-contained binding rows
by the partitioning key
Access all data through logical view over all
partitions (local database)
Table by table basis
26. Shared Nothing
Partitioning+
Each Shard is self-contained and has all the
procs, meta-data and of course your partition of
data
Shard Key common to multiple tables, for
example CustomerID, Email Address.
Greater autonomy across the distributed
database
Seeing the entire database as a logical unit is
more difficult – joining is a nightmare
Node 1
Node 2
Node 3
27. Data Distribution using Hashing
Distributed Database Cluster has fixed number of data nodes
Your data is spread across the database cluster
◦ 10 node cluster; each data item may reside on 3 nodes
◦ Which 3 nodes?
Data key is Hashed to a number – hashing algorithm is deterministic
data-node = f( data-key )
◦ print ( checksum( 'All hale to the ale' ) * 1.) % 10
◦ print ( checksum( 'And a glass of wine for the ladies' ) * 1.) % 10
28. Sharding Sync
LOGICAL
DATABASE
Pick a
node
Node 1
Node 2
Node 3
Full copy of data
Subset of data
Replication
Apps
29. Postgres-XC
Applications
(issue SQL to coordinators)
Coordinators
(plans, 2pc trans, knows
about data distribution)
Data Nodes
GTM
Global
Transaction
Manager
http://de.slideshare.net/PavanDeolasee/postgresxc-28475161
30. Combine Sharding + Replication
Shard your big tables based on a hash (or something) around your business key e.g. Customer,
EmailAddress etc.
Replicate static tables.
GTM keeps simple state info (not a database itself)
GXID (Global Transaction ID’s) – across cluster
MVCC
One active GTM per cluster, though standby’s available