SlideShare a Scribd company logo
1 of 3
Thought about it…
                                                      Most of my wish list hasn’t changed
                                                      much
         An outside view…                               Sigmod 97 keynote about
                                                        search
                                                        CIDR 2003 keynote about new areas that
               Prof. Eric A. Brewer                     don’t fit DBMS well
                   UC Berkeley                        So, some review, some new stuff
            Intel Research (until July)




Proposal: Layered Database                           Example: Search Engines
 Pros:                                                No use of database technology
   Enable new database-like things                    Things that would have been helpful:
   Faster innovation for components                     High availability and replication
   Many parallel experiments (like Linux)               Atomic version vectors
   Should be public domain ideally                      Tools for new declarative languages
 Cons:                                                  Join machinery
   Hard to ensure global properties                   Not needed:
      But those that care will get them…
                                                        Complex locks, Query Optimization
 Closest is Berkeley DB (?)                             Transactions, Redo, Undo




Example: Scientific Computing                        Other Misfits
 Uses databases, but not a good fit                   Bioinformatics:
   Data often stored in files                           Wrong operators
   Most operators are outside the DBMS                  Need error propagation
   Database is an expensive replicated file system      Versioning, read mostly
   (in/out but no joins)
                                                      App Servers:
 Things that layered system might provide:
                                                        Session state, session migration
   Multi-version storage system
   New operators
                                                        App server will be a small database
   Tools for new declarative languages                Workflow




                                                                                                 1
So what happened?                                            Directions I’d like to see…
 Accepted: one size does not fit all…                         Integrated notion of statistics
 Couldn’t get much traction on layered                          Store the noise (rather than clean it)
 database                                                       Create cleaner views
 Built our own from scratch                                     Core probabilistic queries
   Stasis, Rusty Sears                                        Move away from update-in-place
   Open source, could be something special                      Many inputs are sacred (e.g. science)
 But big picture largely unchanged                              Transactional versioning
   Too hard to explore the fun spaces                           Provenance & annotation
   But layering DID happen!
      But whole database is now just transactional storage




Directions (2)                                               Many Core
 Better integration into PL                                   Hard to get any performance benefit for
 BASE semantics (not just ACID)                               I/O bound applications
 Repeated automatic extraction                                Main memory DB??
   Web crawlers do this                                         Limited by off-chip bandwidth
   Much of MapReduce workload                                   Need dataflow optimizations on/off chip
   Need to integrate with versioning,
   provenance, statistics
   Import is a continuous process, not an
   event




Backup                                                       1) Layering enables competition
                                                              Examples from OS community:
                                                                X86, SPEC benchmarks, Virtual machines
                                                                SCSI disks, RAID, NAS
                                                                Routers, Firewalls, Proxies
                                                              Some layers commodities (raw disks)
                                                              Some layers innovative (replication)
                                                              Always have unexpected uses




                                                                                                          2
2) Many more experiments                          3) Reduces Time to Market
 Centralized planning tries very few               Lower cost of entry
 things                                            More important:
                                                     Just good enough!
 Layering enables many more bets
                                                     Few global properties in early versions
   Also enables VC funding                              The web, search engines, even e-commerce
     Ex: IP layer, ASICs => networking startups         P2P
   Enables niche markets (lower cost of entry)          WebMethods

     Easier path for XML, bio, spatial, ….           Global properties added over time!

   Most bets fail, but some succeed                Ugly but fast wins the race…




Claims                                            Conclusions
 If you can’t control, then enable                 Can’t control (or predict) the future…
   This is the lesson from OS work for CIDR         better to enable broad innovation
   Unix, TCP enabled the web
     Neither attempted to control usage
                                                   Control
                                                     Make global properties tractable
   HTTP in turn enabled P2P                          But limits innovation
 DB research suffers from “Albatross 9i”
   Artifact hides the enabling technology          A public domain layered database:
   CIDR exists for this reason                       Would enable more innovation
                                                     Allow a broader range of properties




Rate of Innovation
 Claim: layering increases innovation

 1) Enables competition
 2) Many more experiments
 3) Reduces time to market




                                                                                                   3

More Related Content

More from infoblog

More from infoblog (11)

Claremont Report on Database Research: Research Directions (Rakesh Agrawal)
Claremont Report on Database Research: Research Directions (Rakesh Agrawal)Claremont Report on Database Research: Research Directions (Rakesh Agrawal)
Claremont Report on Database Research: Research Directions (Rakesh Agrawal)
 
Claremont Report on Database Research: Research Directions (Gerhard Weikum)
Claremont Report on Database Research: Research Directions (Gerhard Weikum)Claremont Report on Database Research: Research Directions (Gerhard Weikum)
Claremont Report on Database Research: Research Directions (Gerhard Weikum)
 
Claremont Report on Database Research: Research Directions (Beng Chin Ooi)
Claremont Report on Database Research: Research Directions (Beng Chin Ooi)Claremont Report on Database Research: Research Directions (Beng Chin Ooi)
Claremont Report on Database Research: Research Directions (Beng Chin Ooi)
 
Claremont Report on Database Research: Research Directions (Yannis E. Ioannidis)
Claremont Report on Database Research: Research Directions (Yannis E. Ioannidis)Claremont Report on Database Research: Research Directions (Yannis E. Ioannidis)
Claremont Report on Database Research: Research Directions (Yannis E. Ioannidis)
 
Claremont Report on Database Research: Research Directions (Donald Kossmann)
Claremont Report on Database Research: Research Directions (Donald Kossmann)Claremont Report on Database Research: Research Directions (Donald Kossmann)
Claremont Report on Database Research: Research Directions (Donald Kossmann)
 
Claremont Report on Database Research: Research Directions (Johannes Gehrke)
Claremont Report on Database Research: Research Directions (Johannes Gehrke)Claremont Report on Database Research: Research Directions (Johannes Gehrke)
Claremont Report on Database Research: Research Directions (Johannes Gehrke)
 
Claremont Report on Database Research: Research Directions (Alon Y. Halevy)
Claremont Report on Database Research: Research Directions (Alon Y. Halevy)Claremont Report on Database Research: Research Directions (Alon Y. Halevy)
Claremont Report on Database Research: Research Directions (Alon Y. Halevy)
 
Claremont Report on Database Research: Research Directions (Anastasia Ailamaki)
Claremont Report on Database Research: Research Directions (Anastasia Ailamaki)Claremont Report on Database Research: Research Directions (Anastasia Ailamaki)
Claremont Report on Database Research: Research Directions (Anastasia Ailamaki)
 
Spot Sigs
Spot SigsSpot Sigs
Spot Sigs
 
Database Research Principles Revealed (Small Size)
Database Research Principles Revealed (Small Size)Database Research Principles Revealed (Small Size)
Database Research Principles Revealed (Small Size)
 
Database Research Principles Revealed
Database Research Principles RevealedDatabase Research Principles Revealed
Database Research Principles Revealed
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Claremont Report on Database Research: In Depth Talk (Eric A. Brewer)

  • 1. Thought about it… Most of my wish list hasn’t changed much An outside view… Sigmod 97 keynote about search CIDR 2003 keynote about new areas that Prof. Eric A. Brewer don’t fit DBMS well UC Berkeley So, some review, some new stuff Intel Research (until July) Proposal: Layered Database Example: Search Engines Pros: No use of database technology Enable new database-like things Things that would have been helpful: Faster innovation for components High availability and replication Many parallel experiments (like Linux) Atomic version vectors Should be public domain ideally Tools for new declarative languages Cons: Join machinery Hard to ensure global properties Not needed: But those that care will get them… Complex locks, Query Optimization Closest is Berkeley DB (?) Transactions, Redo, Undo Example: Scientific Computing Other Misfits Uses databases, but not a good fit Bioinformatics: Data often stored in files Wrong operators Most operators are outside the DBMS Need error propagation Database is an expensive replicated file system Versioning, read mostly (in/out but no joins) App Servers: Things that layered system might provide: Session state, session migration Multi-version storage system New operators App server will be a small database Tools for new declarative languages Workflow 1
  • 2. So what happened? Directions I’d like to see… Accepted: one size does not fit all… Integrated notion of statistics Couldn’t get much traction on layered Store the noise (rather than clean it) database Create cleaner views Built our own from scratch Core probabilistic queries Stasis, Rusty Sears Move away from update-in-place Open source, could be something special Many inputs are sacred (e.g. science) But big picture largely unchanged Transactional versioning Too hard to explore the fun spaces Provenance & annotation But layering DID happen! But whole database is now just transactional storage Directions (2) Many Core Better integration into PL Hard to get any performance benefit for BASE semantics (not just ACID) I/O bound applications Repeated automatic extraction Main memory DB?? Web crawlers do this Limited by off-chip bandwidth Much of MapReduce workload Need dataflow optimizations on/off chip Need to integrate with versioning, provenance, statistics Import is a continuous process, not an event Backup 1) Layering enables competition Examples from OS community: X86, SPEC benchmarks, Virtual machines SCSI disks, RAID, NAS Routers, Firewalls, Proxies Some layers commodities (raw disks) Some layers innovative (replication) Always have unexpected uses 2
  • 3. 2) Many more experiments 3) Reduces Time to Market Centralized planning tries very few Lower cost of entry things More important: Just good enough! Layering enables many more bets Few global properties in early versions Also enables VC funding The web, search engines, even e-commerce Ex: IP layer, ASICs => networking startups P2P Enables niche markets (lower cost of entry) WebMethods Easier path for XML, bio, spatial, …. Global properties added over time! Most bets fail, but some succeed Ugly but fast wins the race… Claims Conclusions If you can’t control, then enable Can’t control (or predict) the future… This is the lesson from OS work for CIDR better to enable broad innovation Unix, TCP enabled the web Neither attempted to control usage Control Make global properties tractable HTTP in turn enabled P2P But limits innovation DB research suffers from “Albatross 9i” Artifact hides the enabling technology A public domain layered database: CIDR exists for this reason Would enable more innovation Allow a broader range of properties Rate of Innovation Claim: layering increases innovation 1) Enables competition 2) Many more experiments 3) Reduces time to market 3