SlideShare une entreprise Scribd logo
1  sur  15
The Proliferation of Database
 Systems and the Data Silo
           Problem
          @daniel_abadi
      Yale University / Hadapt
          April 26th, 2012
In The Old Days …




      Database
In The Old Days …

Database

                   ETL Tools

Database



                                Data Warehouse
Database    Data Integration
            Tools



 External
  Data                         MDM Tools
                               Data Governance Tools
One Size Does Not Fit All
Transactional Databases
– Single digit millisecond latencies, and high
  throughput
– Store data in rows
– Heavy on flash and main memory
– Indexing is very important
– High availability extremely important
One Size Does Not Fit All
Analytical Databases
– Single digit second latencies (and higher)
– Store data in columns
– Scale out commodity hardware
– Still need magnetic disk
– Indexing less important
– High availability less important
One Size Does Not Fit All
Streaming Databases
– Continuous queries
– Data flows through the system
– Network latencies are paramount
– Drop data to deal with load
Therefore, in my PhD years
          alone …
Aurora and Borealis projects became
Streambase
C-Store project became Vertica
H-Store project became VoltDB
Right Tool for the Job
What We Have Now …

                      Analytical
 Transactional        Datamart         OLAP Database        Hadoop
    DBMS




                     Reporting and           High
 Transactional     Dashboarding Data    Performance
                                                         Streaming DBMS
    DBMS              Warehouse         Column-Store
                                       Analytical DBMS




Web DBMS (like
                     Web Logs              NoSQL            NewSQL
   MySQL)
What We Have Now …

                      Analytical
 Transactional        Datamart         OLAP Database        Hadoop
    DBMS




                     Reporting and           High
 Transactional     Dashboarding Data    Performance
                                                         Streaming DBMS
    DBMS              Warehouse         Column-Store
                                       Analytical DBMS




Web DBMS (like
                     Web Logs              NoSQL            NewSQL
   MySQL)
What We Have Now …

                      Analytical
 Transactional        Datamart         OLAP Database        Hadoop
    DBMS




                     Reporting and           High
 Transactional     Dashboarding Data    Performance
                                                         Streaming DBMS
    DBMS              Warehouse         Column-Store
                                       Analytical DBMS




Web DBMS (like
                     Web Logs              NoSQL            NewSQL
   MySQL)
What This Leads To…
Very little data provenance
Data silos
Non identical data copies
Not even close to a single version of the
truth
A Potential Way Towards a
         Solution



  Data Analysis                            Data
     DBMS                               Streaming
     (Hive,                            (Hstreaming

                   Hadoop
    Hadapt)                              Flume)




                   NoSQL & Simple
                     Xacts & Short
                  Request Processing
                    (HBase, Brisk)
What this has Potential to
           Enable
Fewer data silos
Increased data provenance
Reduced systems management overhead
Better resource utilization and
management
But we still need
Hadoop-based data integration tools
MDM and data governance tools for
Hadoop
Data provenance tracking across Hadoop
projects

Contenu connexe

Tendances

Tendances (20)

Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
 
Jstorm introduction-0.9.6
Jstorm introduction-0.9.6Jstorm introduction-0.9.6
Jstorm introduction-0.9.6
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
HDFS
HDFSHDFS
HDFS
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop Technology
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 

En vedette

CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and Determinism
Daniel Abadi
 

En vedette (10)

Large Scale ETL with Hadoop
Large Scale ETL with HadoopLarge Scale ETL with Hadoop
Large Scale ETL with Hadoop
 
Beckman abadi-5min-pres
Beckman abadi-5min-presBeckman abadi-5min-pres
Beckman abadi-5min-pres
 
Invisible loading
Invisible loadingInvisible loading
Invisible loading
 
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 Panel
 
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
 
Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System Design
 
VLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-StoresVLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-Stores
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
 
CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and Determinism
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
 

Similaire à Boston Hadoop Meetup, April 26 2012

The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage System
Schubert Zhang
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
Khanderao Kand
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 

Similaire à Boston Hadoop Meetup, April 26 2012 (20)

Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage System
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
PSSUG Nov 2012: Big Data with SQL Server
PSSUG Nov 2012: Big Data with SQL ServerPSSUG Nov 2012: Big Data with SQL Server
PSSUG Nov 2012: Big Data with SQL Server
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Real World NoSQL (by Chris Yuen)
Real World NoSQL (by Chris Yuen)Real World NoSQL (by Chris Yuen)
Real World NoSQL (by Chris Yuen)
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Boston Hadoop Meetup, April 26 2012

  • 1. The Proliferation of Database Systems and the Data Silo Problem @daniel_abadi Yale University / Hadapt April 26th, 2012
  • 2. In The Old Days … Database
  • 3. In The Old Days … Database ETL Tools Database Data Warehouse Database Data Integration Tools External Data MDM Tools Data Governance Tools
  • 4. One Size Does Not Fit All Transactional Databases – Single digit millisecond latencies, and high throughput – Store data in rows – Heavy on flash and main memory – Indexing is very important – High availability extremely important
  • 5. One Size Does Not Fit All Analytical Databases – Single digit second latencies (and higher) – Store data in columns – Scale out commodity hardware – Still need magnetic disk – Indexing less important – High availability less important
  • 6. One Size Does Not Fit All Streaming Databases – Continuous queries – Data flows through the system – Network latencies are paramount – Drop data to deal with load
  • 7. Therefore, in my PhD years alone … Aurora and Borealis projects became Streambase C-Store project became Vertica H-Store project became VoltDB
  • 8. Right Tool for the Job
  • 9. What We Have Now … Analytical Transactional Datamart OLAP Database Hadoop DBMS Reporting and High Transactional Dashboarding Data Performance Streaming DBMS DBMS Warehouse Column-Store Analytical DBMS Web DBMS (like Web Logs NoSQL NewSQL MySQL)
  • 10. What We Have Now … Analytical Transactional Datamart OLAP Database Hadoop DBMS Reporting and High Transactional Dashboarding Data Performance Streaming DBMS DBMS Warehouse Column-Store Analytical DBMS Web DBMS (like Web Logs NoSQL NewSQL MySQL)
  • 11. What We Have Now … Analytical Transactional Datamart OLAP Database Hadoop DBMS Reporting and High Transactional Dashboarding Data Performance Streaming DBMS DBMS Warehouse Column-Store Analytical DBMS Web DBMS (like Web Logs NoSQL NewSQL MySQL)
  • 12. What This Leads To… Very little data provenance Data silos Non identical data copies Not even close to a single version of the truth
  • 13. A Potential Way Towards a Solution Data Analysis Data DBMS Streaming (Hive, (Hstreaming Hadoop Hadapt) Flume) NoSQL & Simple Xacts & Short Request Processing (HBase, Brisk)
  • 14. What this has Potential to Enable Fewer data silos Increased data provenance Reduced systems management overhead Better resource utilization and management
  • 15. But we still need Hadoop-based data integration tools MDM and data governance tools for Hadoop Data provenance tracking across Hadoop projects