SlideShare a Scribd company logo
1 of 31
Database History
A tale of two papers
Its Me!
Doug Turnbull
@softwaredoug
Search & Big Data Architect
OpenSource Connections
http://o19s.com
Charlottesville VA, USA
Outline
• A Relational Model of Data for Large Shared
Data Banks -- Edgar F. Codd
• Brewer’s Conjecture and the Feasibility of
Consistent, Available, Partition-Tolerant Web
Services -- Eric A. Brewer
Why?
RDMS

NoSQL

Declarative

Procedural

Mathematical
Precision

Computational
Precision

Data

Computational
Transparency

Purity

See the wisdom in both paths

“A foolish consistency is the hobgoblin of little
minds” – Emerson
Let’s make a database!
Username,Address,Videos Rented
andy
...
Each line is a record
don,...
doug,1234 bagby st,"Top Gun,Terminator,The Matrix"
...
rick,9212 frontwell ave,“Godfather Part I,Top Gun”
ryan,
...
Index:
...
doug:offset 512
...
rick:offset 9212
...

Where to find each users
data in the file
Make each movie a record?
Username,Address,Videos Rented
…
How do I store the videos
doug,1234 bagby st, ???
a user has rented?
...
rick,9212 frontwell ave
Aggregate them with the user record?
Movie Name,Price,NumInStock
Top Gun,$1.99,5
Store movie records the same way?
…
Index:
...
doug:offset 512
...
rick:offset 9212
...
top gun:offset 15000

Index movie records
Network Databases
• Early databases (Codasyl/DBTG)
– Record based
• Either hierarchical or navigational

– Navigational: Records own other records by
means of a “set” construct

• How might this look in our example?
Codasyl/DBTG
• Early databases, weak abstraction over a file
Basic Unit “Record”
Record Name is USER
Location Mode is CALC Using username
Duplicates are not allowed
username Type character 25
address Type character 50
phonenumber Type character 10

Record Name is VIDEO

Records own other records
via sets
Set Name is USER-VIDEOS
ORDER is NEXT
RETENTION is MANDATORY
Owner is USER
Member is VIDEO
Users -> Videos
Username,Address,Videos Rented SET
…
doug,1234 bagby st,<Top Gun,Terminator,The Matrix>
Movie Name,Price,NumInStock
Top Gun,$1.99,5
…

Set inline with data?

User -> Video SET
Doug,Top Gun,Terminator,The Matrix
Index:
...
doug:offset 512
...
top gun:offset 15123
doug_videos:offset 17582

Or Set as its own record?
Querying for Videos
MOVE ‘Doug’ to USERNAME
FIND Any User USING USERNAME
FIND First VIDEO WITHIN User.Videos
DO WHILE (dbstatus=0)
GET VIDEO
PRINT (VIDEO)
FIND NEXT VDEO WITHIN User.Videos
Summing Up
• Built from the bottom up
• Makes me think of:
User
public class User {
private Video[] videos;
}

Video
Is this ownership (aggregation)?
Or is this just an association with a
video owned by another object?
Codd’s Criticisms
• Application is heavily dependent on storage
constraints
– Bottom Up
• Access Path dependencies (which record do I access
first? Users before videos? Who owns what?)
• Order Dependencies (set order is defined at index time,
iterations occur over that order)
• Indexing Dependencies (indexes referenced by name)

• Changing these things breaks applications!
Codd
A tuple is a sufficient abstraction to represent a relation
(Doug, 1234 Bagby St, <Top Gun, 3.99, Terminator, 12.99>)

We can introduce “Normalization”
Users
(Doug, 1234 Bagby St)
Rented Videos
(Doug, Top Gun, 3.99)
(Doug, Terminator, 12.99)

We can reason about data with mathematical certainty
RDMS Features
• Codd defines a set of operations
• Most importantly the JOIN
– Create any derived relation from a stored relation
Checking Codd’s Criticisms
• Access Dependencies – all data is normalized
into a structure optimal for asking any
question
• Order Dependencies – relations do not
guarantee any order (though the query
language can specify a sort)
• Indexing Dependencies – We don’t need to
refer to the index when querying (its just a
bonus)
Stop thinking about the file
Username,Address,Videos Rented
…
doug,1234 bagby st,
...
rick,9212 frontwell ave
Movie Name,Price,NumInStock
Top Gun,$1.99,5
…
Index:
...
doug:offset 512
...
rick:offset 9212
...
top gun:offset 15000
Start thinking about Normalized
Relations!

Users
(Doug, 1234 Bagby St, <Top Gun, 3.99, Terminator, 12.99>)
Rented Videos
(Doug, Top Gun)
(Doug, Terminator)

Videos
(Top Gun, 3.99)
(Terminator, 12.99)
Retrospective?
• How do NoSQL databases do with these
issues? Access Dependencies, Indexing
Dependencies, Order Dependencies?
– Is it even a fair criticism?
– Why is it ok in NoSQL but not in SQL (is it ok?)?
– ???
Fast Forward to early 2000s
• SQL Databases have “won”; Codd’s vision
thriving
• We can always scale with beefing up our
hardware – “Vertical Scalability”
• Single system PoV
Trouble Ahead
“The Free Lunch is Over!” – Herb Sutter
The Free Lunch Is Over
A Fundamental Turn Toward Concurrency in Software

• Per HD size plateuing
• Hard Drive throughput
plateauing
Trouble Ahead
• Instead of scaling vertically, we need to find
ways to scale horizontally
– “Elastic” scalability, add more systems to get more
performance
– Scaling horizontally (more less performant servers)
than vertical horizontally

How do we design databases to take advantage of the scale, and grow
The Problem
• How do we design databases to take
advantage of horizontal scalability?

• Are the traditional RDMS databases up to this
task?
Enter Brewer’s CAP Theorem
• The CAP Theorem, introduced in Brewer’s
Conjecture and the Feasibility of Consistent,
Available, Partition-Tolerant Web Services
CAP Theorem Explained
• In the presence of a partition system must
chose between being consistent or available
Consistent
- Will not respond
to request until
consistency can
be guaranteed.

Available
- Will respond to
request, even if
consistency
cannot be
guaranteed
CAP Theorem
• In other words, in the case of horizontal
scalability, (i.e., potential partitions) what do
we do when servers can’t communicate?
– Block? (wait till we can confirm consistency)
– Respond? (we can figure this all out later)
CAP Theorem in Human Organizations
• You receive an order from a customer over the
phone do you:
– Wait until the boss has signed off and reconciled
with the rest of the orders?
• Maybe blocking all your colleagues as your boss takes
time to respond?

– Or do you just respond saying “yes!” knowing
maybe this customer is impatient (or maybe
maintaining consistent inventory isn’t important)
What does this mean for databases?
CA

SQL, Codasyl, a big file,
(basically the history of
databases to this point)

CP

AP

???

???
What does this mean for databases?
SQL, Codasyl, a big file,
(basically the history of
databases to this point)
CA

Partition == Decision

When implementing a
partitionable database, choose
between consistency and
availability

CP

AP

Call the boss before
completing the order?

Respond quickly to
guarantee the sale?
What else does this mean?
• Database designers must chose to focus on
either consistent applications or available
applications
• Thus… much of NoSQL is born
– Big focus: options for more AP systems
• Available and Partitioned

• Bottom line:
– Choices choices choices, what corner of the
triangle are you on?
What else does this mean?
• Many NoSQL databases end-up being
designed bottom-up for horizontal scalability
– Simpler, lower level APIS (set, get, put)
– Hierarchical Schemas
– Sometimes distributed based on order?
Controversial Question of the Day
• Have we come full circle?

• Or are we just responding to the technical
challenges of the CAP theorem?

• Answers? (questions ok too )

More Related Content

What's hot

Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...✔ Eric David Benari, PMP
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar SlidesDuraSpace
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
Latest trends in database management
Latest trends in database managementLatest trends in database management
Latest trends in database managementBcomBT
 
Five database trends - updated April 2015
Five database trends - updated April 2015Five database trends - updated April 2015
Five database trends - updated April 2015Guy Harrison
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsTodd Hoff
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
Big Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandBig Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandAndrew Brust
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
OpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers OverviewOpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers OverviewKingsley Uyi Idehen
 
Silicon valley nosql meetup april 2012
Silicon valley nosql meetup  april 2012Silicon valley nosql meetup  april 2012
Silicon valley nosql meetup april 2012InfiniteGraph
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructureelliando dias
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
The technology of the Human Protein Reference Database (draft, 2003)
The technology of the Human Protein Reference Database (draft, 2003)The technology of the Human Protein Reference Database (draft, 2003)
The technology of the Human Protein Reference Database (draft, 2003)Kiran Jonnalagadda
 

What's hot (20)

Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Latest trends in database management
Latest trends in database managementLatest trends in database management
Latest trends in database management
 
Five database trends - updated April 2015
Five database trends - updated April 2015Five database trends - updated April 2015
Five database trends - updated April 2015
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Big Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandBig Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-Land
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
OpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers OverviewOpenLink Virtuoso - Management & Decision Makers Overview
OpenLink Virtuoso - Management & Decision Makers Overview
 
Silicon valley nosql meetup april 2012
Silicon valley nosql meetup  april 2012Silicon valley nosql meetup  april 2012
Silicon valley nosql meetup april 2012
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructure
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
The technology of the Human Protein Reference Database (draft, 2003)
The technology of the Human Protein Reference Database (draft, 2003)The technology of the Human Protein Reference Database (draft, 2003)
The technology of the Human Protein Reference Database (draft, 2003)
 
Os Krug
Os KrugOs Krug
Os Krug
 
Nosql
NosqlNosql
Nosql
 

Similar to Database History From Codd to Brewer

Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplDuyhai Doan
 
Stuff About CQRS
Stuff About CQRSStuff About CQRS
Stuff About CQRSthinkddd
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
DevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps TeamDevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps TeamNick DeMaster
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinAmazon Web Services
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysDuyhai Doan
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'Kyle Hailey
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupMayaData Inc
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...IDERA Software
 
Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015Chip Childers
 
Platform Clouds, Containers, Immutable Infrastructure Oh My!
Platform Clouds, Containers, Immutable Infrastructure Oh My!Platform Clouds, Containers, Immutable Infrastructure Oh My!
Platform Clouds, Containers, Immutable Infrastructure Oh My!Stuart Charlton
 
Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleMayaData
 

Similar to Database History From Codd to Brewer (20)

How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
 
Stuff About CQRS
Stuff About CQRSStuff About CQRS
Stuff About CQRS
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
DevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps TeamDevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps Team
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
NoSQL
NoSQLNoSQL
NoSQL
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit Dublin
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev days
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
 
SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris Meetup
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
 
Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015
 
Platform Clouds, Containers, Immutable Infrastructure Oh My!
Platform Clouds, Containers, Immutable Infrastructure Oh My!Platform Clouds, Containers, Immutable Infrastructure Oh My!
Platform Clouds, Containers, Immutable Infrastructure Oh My!
 
Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps style
 

More from OpenSource Connections

How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullOpenSource Connections
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonOpenSource Connections
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...OpenSource Connections
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajOpenSource Connections
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...OpenSource Connections
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlOpenSource Connections
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...OpenSource Connections
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...OpenSource Connections
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...OpenSource Connections
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...OpenSource Connections
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...OpenSource Connections
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...OpenSource Connections
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah ViaOpenSource Connections
 

More from OpenSource Connections (20)

Encores
EncoresEncores
Encores
 
Test driven relevancy
Test driven relevancyTest driven relevancy
Test driven relevancy
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
 
Payloads and OCR with Solr
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with Solr
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
 

Database History From Codd to Brewer

  • 1. Database History A tale of two papers
  • 2. Its Me! Doug Turnbull @softwaredoug Search & Big Data Architect OpenSource Connections http://o19s.com Charlottesville VA, USA
  • 3. Outline • A Relational Model of Data for Large Shared Data Banks -- Edgar F. Codd • Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services -- Eric A. Brewer
  • 5. Let’s make a database! Username,Address,Videos Rented andy ... Each line is a record don,... doug,1234 bagby st,"Top Gun,Terminator,The Matrix" ... rick,9212 frontwell ave,“Godfather Part I,Top Gun” ryan, ... Index: ... doug:offset 512 ... rick:offset 9212 ... Where to find each users data in the file
  • 6. Make each movie a record? Username,Address,Videos Rented … How do I store the videos doug,1234 bagby st, ??? a user has rented? ... rick,9212 frontwell ave Aggregate them with the user record? Movie Name,Price,NumInStock Top Gun,$1.99,5 Store movie records the same way? … Index: ... doug:offset 512 ... rick:offset 9212 ... top gun:offset 15000 Index movie records
  • 7. Network Databases • Early databases (Codasyl/DBTG) – Record based • Either hierarchical or navigational – Navigational: Records own other records by means of a “set” construct • How might this look in our example?
  • 8. Codasyl/DBTG • Early databases, weak abstraction over a file Basic Unit “Record” Record Name is USER Location Mode is CALC Using username Duplicates are not allowed username Type character 25 address Type character 50 phonenumber Type character 10 Record Name is VIDEO Records own other records via sets Set Name is USER-VIDEOS ORDER is NEXT RETENTION is MANDATORY Owner is USER Member is VIDEO
  • 9. Users -> Videos Username,Address,Videos Rented SET … doug,1234 bagby st,<Top Gun,Terminator,The Matrix> Movie Name,Price,NumInStock Top Gun,$1.99,5 … Set inline with data? User -> Video SET Doug,Top Gun,Terminator,The Matrix Index: ... doug:offset 512 ... top gun:offset 15123 doug_videos:offset 17582 Or Set as its own record?
  • 10. Querying for Videos MOVE ‘Doug’ to USERNAME FIND Any User USING USERNAME FIND First VIDEO WITHIN User.Videos DO WHILE (dbstatus=0) GET VIDEO PRINT (VIDEO) FIND NEXT VDEO WITHIN User.Videos
  • 11. Summing Up • Built from the bottom up • Makes me think of: User public class User { private Video[] videos; } Video Is this ownership (aggregation)? Or is this just an association with a video owned by another object?
  • 12. Codd’s Criticisms • Application is heavily dependent on storage constraints – Bottom Up • Access Path dependencies (which record do I access first? Users before videos? Who owns what?) • Order Dependencies (set order is defined at index time, iterations occur over that order) • Indexing Dependencies (indexes referenced by name) • Changing these things breaks applications!
  • 13. Codd A tuple is a sufficient abstraction to represent a relation (Doug, 1234 Bagby St, <Top Gun, 3.99, Terminator, 12.99>) We can introduce “Normalization” Users (Doug, 1234 Bagby St) Rented Videos (Doug, Top Gun, 3.99) (Doug, Terminator, 12.99) We can reason about data with mathematical certainty
  • 14. RDMS Features • Codd defines a set of operations • Most importantly the JOIN – Create any derived relation from a stored relation
  • 15. Checking Codd’s Criticisms • Access Dependencies – all data is normalized into a structure optimal for asking any question • Order Dependencies – relations do not guarantee any order (though the query language can specify a sort) • Indexing Dependencies – We don’t need to refer to the index when querying (its just a bonus)
  • 16. Stop thinking about the file Username,Address,Videos Rented … doug,1234 bagby st, ... rick,9212 frontwell ave Movie Name,Price,NumInStock Top Gun,$1.99,5 … Index: ... doug:offset 512 ... rick:offset 9212 ... top gun:offset 15000
  • 17. Start thinking about Normalized Relations! Users (Doug, 1234 Bagby St, <Top Gun, 3.99, Terminator, 12.99>) Rented Videos (Doug, Top Gun) (Doug, Terminator) Videos (Top Gun, 3.99) (Terminator, 12.99)
  • 18. Retrospective? • How do NoSQL databases do with these issues? Access Dependencies, Indexing Dependencies, Order Dependencies? – Is it even a fair criticism? – Why is it ok in NoSQL but not in SQL (is it ok?)? – ???
  • 19. Fast Forward to early 2000s • SQL Databases have “won”; Codd’s vision thriving • We can always scale with beefing up our hardware – “Vertical Scalability” • Single system PoV
  • 20. Trouble Ahead “The Free Lunch is Over!” – Herb Sutter The Free Lunch Is Over A Fundamental Turn Toward Concurrency in Software • Per HD size plateuing • Hard Drive throughput plateauing
  • 21. Trouble Ahead • Instead of scaling vertically, we need to find ways to scale horizontally – “Elastic” scalability, add more systems to get more performance – Scaling horizontally (more less performant servers) than vertical horizontally How do we design databases to take advantage of the scale, and grow
  • 22. The Problem • How do we design databases to take advantage of horizontal scalability? • Are the traditional RDMS databases up to this task?
  • 23. Enter Brewer’s CAP Theorem • The CAP Theorem, introduced in Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
  • 24. CAP Theorem Explained • In the presence of a partition system must chose between being consistent or available Consistent - Will not respond to request until consistency can be guaranteed. Available - Will respond to request, even if consistency cannot be guaranteed
  • 25. CAP Theorem • In other words, in the case of horizontal scalability, (i.e., potential partitions) what do we do when servers can’t communicate? – Block? (wait till we can confirm consistency) – Respond? (we can figure this all out later)
  • 26. CAP Theorem in Human Organizations • You receive an order from a customer over the phone do you: – Wait until the boss has signed off and reconciled with the rest of the orders? • Maybe blocking all your colleagues as your boss takes time to respond? – Or do you just respond saying “yes!” knowing maybe this customer is impatient (or maybe maintaining consistent inventory isn’t important)
  • 27. What does this mean for databases? CA SQL, Codasyl, a big file, (basically the history of databases to this point) CP AP ??? ???
  • 28. What does this mean for databases? SQL, Codasyl, a big file, (basically the history of databases to this point) CA Partition == Decision When implementing a partitionable database, choose between consistency and availability CP AP Call the boss before completing the order? Respond quickly to guarantee the sale?
  • 29. What else does this mean? • Database designers must chose to focus on either consistent applications or available applications • Thus… much of NoSQL is born – Big focus: options for more AP systems • Available and Partitioned • Bottom line: – Choices choices choices, what corner of the triangle are you on?
  • 30. What else does this mean? • Many NoSQL databases end-up being designed bottom-up for horizontal scalability – Simpler, lower level APIS (set, get, put) – Hierarchical Schemas – Sometimes distributed based on order?
  • 31. Controversial Question of the Day • Have we come full circle? • Or are we just responding to the technical challenges of the CAP theorem? • Answers? (questions ok too )