Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

VoltDB presents

Stonebraker Live!
Navigating the Database Universe

Co-founder and Chief Strategy Officer

SCOTT JARR

Agenda
• The (proper) design of DBMSs
– Presented by Dr. Michael Stonebraker, Co-founder

• The database universe
– Presented by Scott Jarr, Co-founder and Chief Strategy Officer

• Introducing VoltDB 3.0
– Presented by Mark Hydar, VP of Market Technology and Strategy

We Believe…

• “Big Data” is a rare, transformative market
• Velocity is becoming the cornerstone
• Specialized databases (working together) are
the answer
• Products must provide tangible customer
value... Fast

Dr. Michael Stonebraker

THE (PROPER) DESIGN
OF THE DBMS

Lessons from 40 Years of Database Design
1. Get the user interaction right
– Bet on a small number of easy-to-

2.
understand constructs
– Plus standards

Get the implementation right
“ Those who don’t learn
from history are
– Bet on a small number of easy-to-
understand constructs
destined to repeat it.
-Winston Churchill ”
3. One size does not fit all
– At least not if you want fast, big or
complex

#1: Get the User Interaction Right

Historical Lesson: RDBMS vs. CODASYL vs. OODB

Winner: RDBMS Loser: CODASYL Loser: OODBs
• Simple data model • Complicated data model • Complex data model
(records; participate in “sets”; (hierarchical
(tables) set has one owner records, pointers, sets, ar
• Simple access and, perhaps, many
rays, etc.)
members, etc.)
language (SQL) • Complex access
• Messy access language (sea
• ACID (transactions) of “cursors”; some -- but not language
all -- move on every (navigation, through this
• Standards (SQL) command, navigation sea)
programming)
• No standards

Interaction Take Away − Simple is Good

• ACID was easy for people to understand

• SQL provided a standard, high-level language and
made people productive (transportable skills)

#2: Get the Implementation Right
• Leverage a few simple ideas: Early relational implementations

Historical Winners
– System R storage system dropped links
– Views (protection, schema modification, performance)
– Cost-based optimizer
• Leverage a few simple ideas: Postgres
– User-defined data types and functions (adopted by most everybody)
– Rules/triggers
– No-overwrite storage
• Leverage a few simple ideas: Vertica
– Store data by column
– Compressed up the ging gong
– Parallel load without compromising ACID

#3: One Size Does NOT Fit All
• OSFA is an old technology with hundreds
of bags hanging off it
• It breaks 100% of the time when under
“ …specialized systems
can each be a factor of
load 50 faster than the
• Load = size or speed or complexity single ‘one size fits all’
• Load is increasing at a startling rate system…A factor of 50
is nothing to sneeze at.
• Purpose-built will exceed by 10x to 100x
• History has not been completely written
yet…but let’s look at VoltDB as an
-My Top 10 Assertions About
Data Warehouses, 2010
”
example

Example: VoltDB
• Get the interface right
– SQL
– ACID

• Implementation: Leverage a few simple ideas
– Main memory
– Stored procedures
– Deterministic scheduling

• Specialization
– OLTP focus allowed for above implementation choices

Proving the Theory
Useful Work
• Challenge: OLTP 4%

performance
Recovery 24%
Latching 24%
– TPC-C CPU cycles
Buffer Pool 24%
– On the Shore DBMS Locking 24%
prototype

– Elephants should be
similar

Single Threaded
• Gets rid of the latching problem
• What about Multicore?
– Divide the memory on an N-core node so it looks like N single-core nodes
– Which are single threaded…

Implementation Construct #1: Main Memory
• Main memory format for data
– Disk format gets you buffer pool overhead
• What happens if data doesn’t fit?
– Return to disk-buffer pool architecture (slow)
– Anti-caching
• Main memory format for data
• When memory fills up, then bundle together elderly tuples and write them out
• Run a transaction in “sleuth mode”; find the required records and move to main
memory (and pin)
• Run Xact normally

Implementation Construct #2: Stored Procedures

• Round trip to the DBMS is expensive
– Do it once per transaction
– Not once per command
– Or even once per cursor move
• Ad-hoc queries supported
– Turn them into dynamic stored procedures

Implementation Construct #3: Deterministic Scheduling

• Transactions are ordered and run to completion
– No locking
• Active-active replication (HA)
– Run transaction at all replicas – in the same pre-determined order
• What about a cluster-wide power failure?
– Asyn checkpointing
– With a command log
– Wildly faster than data logging

Result of Design Principles: VoltDB Example

• Good interface decisions – made developers more productive
– SQL & ACID

• Leveraging a few simple implementation ideas – made
VoltDB wicked fast
– Main memory
– Stored procedures
– Deterministic scheduling

Proving the Theory

• Answer: OLTP performance
– 3 million transactions per second
“ …we are heading
toward a world with at
least 5 (and probably
– 7x Cassandra
more) specialized
– 15 million SQL statements per engines and the death
second
of the ‘one size fits all’
– 100,000+ transactions per legacy systems.
commodity server
”
-The End of an Architectural
Era (It’s Time for a Complete
Rewrite), 2007

Scott Jarr

THE DATABASE UNIVERSE

Technology Meets the Market
Believe
– “Big Data” is a rare, transformative market
– Velocity is becoming the cornerstone
– Specialized databases (working together) are the answer
– Products must provide tangible customer value… Fast

Observations
– Noisy, crowded and new – kinda like Christmas shopping at the mall
– Everyone wants to understand where the pieces fit
– Analysts build maps on technology NOT use cases

What we need is…

Data Value Chain

Age of Data

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade • Calculate risk • Retrieve click • Backtest algo • Algo discovery
• Serve ad • Leaderboard stream • BI • Log analysis
• Enrich stream • Aggregate • Show orders • Daily reports • Fraud pattern match
• Examine packet • Count
• Approve trans.

Data Value Chain
Value of Individual Aggregate
Data Item Data Value

Data Value
Age of Data

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade • Calculate risk • Retrieve click • Backtest algo • Algo discovery
• Serve ad • Leaderboard stream • BI • Log analysis
• Enrich stream • Aggregate • Show orders • Daily reports • Fraud pattern match
• Examine packet • Count
• Approve trans.

The Database Universe
Fast
Complex
Large
Value of Individual Data Item Aggregate Data Value
Application Complexity

Data Value
Traditional RDBMS
Simple Slow
Small
Transactional Analytic
Exploratory
Interactive Real-time Analytics Record Lookup Historical Analytics
Analytics

The Database Universe
Fast
Complex
Large
Value of Individual Data Item Aggregate Data Value
Application Complexity

Data Value
Velocity Hadoop, etc.
NoSQL
Data
NewSQL Warehouse
Traditional RDBMS
Simple Slow
Small
Transactional Analytic
Exploratory
Interactive Real-time Analytics Record Lookup Historical Analytics
Analytics

logins trades authorizations clicks
sensors orders impressions
Closed-loop Big Data

Interactive & Real-time Analytics

Historical Reports & Analytics

Exploratory Analytics

logins trades authorizations clicks
sensors orders impressions
Closed-loop Big Data
• Make the most
Interactive & Real-time Analytics informed decision
every time there is an
interaction

• Real-time decisions
Historical Reports & Analytics are informed by
Knowledge operational analytics
and past knowledge

Exploratory Analytics

The Velocity Use Case
What’s it look like?
– High throughput, relentless data feeds
– Fast decisions on high-value data
– Real-time, operational analytics present immediate visibility

What’s the big deal?
– Batch visibility converts to real time = immediate business impact
– Decisions made at time of event = higher impact decisions with immediate returns

– Ability to ingest and manage massive amounts of data = business differentiation and disruption

Introducing VoltDB 3.0

• Available now!
– Both commercial and open source offerings
– www.voltdb.com/downloads
Introducing VoltDB 3.0
• Key improvements
– Even faster
– Easier to build high-velocity applications
– Expanded reach across developers and applications
– Extensible to integrate with existing data infrastructure

Latency and Throughput, 50-50 Read/Write Workload
VoltDB 3.0 vs. v2.8.4.1
Key/Value 50/50 read/write workload
16
3 Node, K=1 Cluster

Latency and Throughput, 50-
14

12
Latency (ms)

50 Read/Write Workload
10

8
3.0
2.8.4.1

6

4

2

0
-50000 0 50000 100000 150000 200000 250000 300000
TPS

Read/Write Workload Latency/Throughput
9 VoltDB 3.0
Key/Value various read/write workload
8
3 Node, K=1 Cluster
Avg. Latency (ms)

Read/Write Workload 7

6

5
10% read/90% write

50% read/50% write

Latency/Throughput
90% read/10% write
4

3

2

1

0
-50000 0 50000 100000 150000 200000 250000 300000 350000

TPS

Faster: Ad Hoc SQL Performance

• Conversational SQL

Faster: Ad Hoc SQL
• Thousands to 10,000+ ad hoc SQL transactions/second
• Single or multiple (batch) SQL statement transaction

Performance

Easier Development: New SQL Support

• SQL LIKE and NOT LIKE

Easier Development:
• UNION
• Column Functions

New SQL Support
• Counting function (leaderboard ranking queries)
• Ability to define index using column functions

Easier Development: JSON Support

• JSON values stored in a varchar column

Easier Development:
• Field() column function
• Indexing on JSON elements

JSON Support
CREATE INDEX session_site_moderator
ON user_session_table (field(json_data, 'site'),
field(json_data, 'moderator'), username);

• New JSON sample in kit

Easier Development: Online Operations

• Ability to re-join a failed node to cluster with no impact to
existing operations
Easier Development:
• Online schema update
• No service window
Online Operations

Easier Development: Streamlined Development

• Elimination of project.xml
• VoltDB-specific configuration now defined in DDL
Easier Development:
• Defaulting of deployment.xml

Streamlined Development
• New Volt Compiler CLI:
voltdb compile

Expanded Reach: Cloud-Friendly

• Reduce impact of variable node performance and latency

Expanded Reach:
• Elimination of strict NTP configuration
• Scales to large # of nodes

Cloud-Friendly

Integration: High-Performance Export

• Parallelized export

Integration: High-
• New connectors: JDBC, Netezza, Vertica

Performance Export

Integration: Client Library Updates

• New PHP Client

Integration: Client
• Node.js client v1.0
• Go Client

Library Updates
• Coming soon: updated Erlang client

http://golang.org

Other Notable New Features
• Explain command
• CSV loader utility
Other Notable
• CSV snapshots
• New Administration CLI: voltadmin

New Features
– voltadmin save
– voltadmin restore
– voltadmin pause
– voltadmin resume
– voltadmin shutdown

More Samples Available for Download

More Samples Available
for Download http://voltdb.com/comm
unity/volt-labs.php

Volt University
• Portfolio of instructional content, classes, tools, and other
resources to help them built applications quickly
• Curriculum and supporting material range from beginner to
advanced
Volt University
• Three types of instruction:
– Volt University Online
– Volt University Classroom
– Volt Vanguard Certification

Summary: VoltDB v3.0 Features
• Even faster
• Easier to build high-velocity applications

VoltDB v3.0
• Expanded reach across developers and applications
• Extensible to integrate with existing data infrastructure
• Volt Labs
• Volt University

DOWNLOAD 3.0
Imagine the
at
Possibilities
www.voltdb.com

More Information?
E-mail
info@voltdb.com

Visit our forums

More Information?
http://community.voltdb.com/forum

Read the VoltDB “Getting Started Guide”
http://community.voltdb.com/docs/GettingStarted/index

Follow
@VoltDB on Twitter

Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

En vedette

En vedette (7)

Similaire à Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Similaire à Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB (20)

Plus de BigDataCloud

Plus de BigDataCloud (20)

Dernier

Dernier (20)

Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Notes de l'éditeur