Webinar presentation delivered by Dr. Michael Stonebraker and Scott Jarr of VoltDB on December 11, 2012. www.voltdb.com
The design decisions you make today will have a huge performance impact down the line. Until recently, when it came to databases, the choice was easy. Essentially, you had one option: the RDBMS. Today, there's a new universe of databases being thrown into production — and not always with the greatest success. How do you make the right choice for your next application? Database pioneer Dr. Michael Stonebraker and VoltDB co-founder Scott Jarr have some thoughts.
2. About Our Presenters
Mike Stonebraker Scott Jarr
Co-founder & CTO, VoltDB Co-founder & Chief Strategy
Officer, VoltDB
A pioneer of database research and More than 20 years of experience
technology for more than a quarter of a building, launching and growing
century, and the main architect of the technology companies from inception to
Ingres relational DBMS and the object- market leadership in the
relational DBMS PostgreSQL search, mobile, security, storage and
virtualization markets
3. Agenda
• The (proper) design of DBMSs
– Presented by Dr. Michael Stonebraker
• The database universe
• Where the future value comes from
4. We Believe…
• “Big Data” is a rare, transformative market
• Velocity is becoming the cornerstone
• Specialized databases (working together) are
the answer
• Products must provide tangible customer
value... Fast
6. Lessons from 40 Years of Database Design
1. Get the user interaction right
– Bet on a small number of easy-to-
2.
understand constructs
– Plus standards
Get the implementation right
“ Those who don’t learn
from history are
– Bet on a small number of easy-to-
understand constructs
destined to repeat it.
-Winston Churchill ”
3. One size does not fit all
– At least not if you want fast, big or
complex
7. #1: Get the User Interaction Right
Historical Lesson: RDBMS vs. CODASYL vs. OODB
Winner: RDBMS Loser: CODASYL Loser: OODBs
• Simple data model • Complicated data model • Complex data model
(records; participate in “sets”; (hierarchical
(tables) set has one owner records, pointers, sets, ar
• Simple access and, perhaps, many
rays, etc.)
members, etc.)
language (SQL) • Complex access
• Messy access language (sea
• ACID (transactions) of “cursors”; some -- but not language
all -- move on every (navigation, through this
• Standards (SQL) command, navigation sea)
programming)
• No standards
8. Interaction Take Away − Simple is Good
• ACID was easy for people to understand
• SQL provided a standard, high-level language and
made people productive (transportable skills)
9. #2: Get the Implementation Right
• Leverage a few simple ideas: Early relational implementations
Historical Winners
– System R storage system dropped links
– Views (protection, schema modification, performance)
– Cost-based optimizer
• Leverage a few simple ideas: Postgres
– User-defined data types and functions (adopted by most everybody)
– Rules/triggers
– No-overwrite storage
• Leverage a few simple ideas: Vertica
– Store data by column
– Compressed up the ging gong
– Parallel load without compromising ACID
10. #3: One Size Does NOT Fit All
• OSFA is an old technology with hundreds
of bags hanging off it
• It breaks 100% of the time when under
“ …specialized systems
can each be a factor of
load 50 faster than the
• Load = size or speed or complexity single ‘one size fits all’
• Load is increasing at a startling rate system…A factor of 50
is nothing to sneeze at.
• Purpose-built will exceed by 10x to 100x
• History has not been completely written
yet…but let’s look at VoltDB as an
-My Top 10 Assertions About
Data Warehouses, 2010
”
example
11. Example: VoltDB
• Get the interface right
– SQL
– ACID
• Implementation: Leverage a few simple ideas
– Main memory
– Stored procedures
– Deterministic scheduling
• Specialization
– OLTP focus allowed for above implementation choices
12. Proving the Theory
Useful Work
• Challenge: OLTP 4%
performance
Recovery 24%
Latching 24%
– TPC-C CPU cycles
Buffer Pool 24%
– On the Shore DBMS Locking 24%
prototype
– Elephants should be
similar
13. Implementation Construct #1: Main Memory
• Main memory format for data
– Disk format gets you buffer pool overhead
• What happens if data doesn’t fit?
– Return to disk-buffer pool architecture (slow)
– Anti-caching
• Main memory format for data
• When memory fills up, then bundle together elderly tuples and write them out
• Run a transaction in “sleuth mode”; find the required records and move to main
memory (and pin)
• Run Xact normally
14. Implementation Construct #2: Stored Procedures
• Round trip to the DBMS is expensive
– Do it once per transaction
– Not once per command
– Or even once per cursor move
• Ad-hoc queries supported
– Turn them into dynamic stored procedures
15. Implementation Construct #3:
Deterministic and Non-deterministic Scheduling
• Non-deterministic (can’t tell order until commit time)
– MVCC
– Dynamic locking
• Deterministic
– Time stamp order
16. Result of Design Principles: VoltDB Example
• Good interface decisions – made developers more productive
– SQL & ACID
• Leveraging a few simple implementation ideas – made
VoltDB wicked fast
– Main memory
– Stored procedures
– Deterministic scheduling
17. Proving the Theory
• Answer: OLTP performance
– 3 million transactions per second
“ …we are heading
toward a world with at
least 5 (and probably
– 7x Cassandra
more) specialized
– 15 million SQL statements per engines and the death
second
of the ‘one size fits all’
– 100,000+ transactions per legacy systems.
commodity server
”
-The End of an Architectural
Era (It’s Time for a Complete
Rewrite), 2007
19. Technology Meets the Market
Believe
– “Big Data” is a rare, transformative market
– Velocity is becoming the cornerstone
– Specialized databases (working together) are the answer
– Products must provide tangible customer value… Fast
Observations
– Noisy, crowded and new – kinda like Christmas shopping at the mall
– Everyone wants to understand where the pieces fit
– Analysts build maps on technology NOT use cases
What we need is…
20. Data Value Chain
Age of Data
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Milliseconds Hundredths of seconds Second(s) Minutes Hours
• Place trade • Calculate risk • Retrieve click • Backtest algo • Algo discovery
• Serve ad • Leaderboard stream • BI • Log analysis
• Enrich stream • Aggregate • Show orders • Daily reports • Fraud pattern match
• Examine packet • Count
• Approve trans.
21. Data Value Chain
Value of Individual Aggregate
Data Item Data Value
Data Value
Age of Data
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Milliseconds Hundredths of seconds Second(s) Minutes Hours
• Place trade • Calculate risk • Retrieve click • Backtest algo • Algo discovery
• Serve ad • Leaderboard stream • BI • Log analysis
• Enrich stream • Aggregate • Show orders • Daily reports • Fraud pattern match
• Examine packet • Count
• Approve trans.
22. The Database Universe
Fast
Complex
Large
Value of Individual Data Item Aggregate Data Value
Application Complexity
Data Value
Traditional RDBMS
Simple Slow
Small
Transactional Analytic
Exploratory
Interactive Real-time Analytics Record Lookup Historical Analytics
Analytics
23. The Database Universe
Fast
Complex
Large
Value of Individual Data Item Aggregate Data Value
Application Complexity
Data Value
Velocity Hadoop, etc.
NoSQL
Data
NewSQL Warehouse
Traditional RDBMS
Simple Slow
Small
Transactional Analytic
Exploratory
Interactive Real-time Analytics Record Lookup Historical Analytics
Analytics
25. logins trades authorizations clicks
sensors orders impressions
Closed-loop Big Data
• Make the most
Interactive & Real-time Analytics informed decision
every time there is an
interaction
• Real-time decisions
Historical Reports & Analytics are informed by
Knowledge operational analytics
and past knowledge
Exploratory Analytics
26. The Velocity Use Case
What’s it look like?
– High throughput, relentless data feeds
– Fast decisions on high-value data
– Real-time, operational analytics present immediate visibility
What’s the big deal?
– Batch converts to real time = efficiency
– Decisions made at time of event = better decisions
– Ability to micro segment/target/personalize/etc. = conversion, satisfaction, more data is coming at
you, use it to improve your business