SlideShare une entreprise Scribd logo
1  sur  40
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Curt Monash ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Our agenda ,[object Object],[object Object],[object Object],[object Object]
Why are there specialized analytic DBMS? ,[object Object],[object Object],[object Object],[object Object]
Moore’s Law, Kryder’s Law, and a huge exception ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],03/13/10 DRAFT!!  THIRD TEST!!
The “1,000,000:1” disk-speed barrier ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Software strategies to optimize analytic I/O ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hardware strategies to optimize analytic I/O ,[object Object],[object Object],[object Object],[object Object]
Specialty hardware strategies ,[object Object],[object Object],[object Object],[object Object],[object Object]
18 contenders (and there are more) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General areas of feature differentiation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Major analytic DBMS product groupings ,[object Object],[object Object],[object Object],[object Object],[object Object]
Traditional OLTP examples ,[object Object],[object Object],[object Object]
Analytic optimizations for OLTP DBMS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Drawbacks ,[object Object],[object Object],[object Object],[object Object]
Legitimate use scenarios ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Row-based MPP examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Typical design choices in row-based MPP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tradeoffs among row MPP alternatives ,[object Object],[object Object],[object Object],[object Object],[object Object]
Columnar DBMS examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Columnar pros and cons ,[object Object],[object Object],[object Object],[object Object]
Segmentation – a first cut ,[object Object],[object Object],[object Object],[object Object],[object Object]
Basics of systematic segmentation ,[object Object],[object Object],[object Object]
Use cases – a first cut ,[object Object],[object Object],[object Object],[object Object]
Metrics – a first cut ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Basic platform issues ,[object Object],[object Object],[object Object],[object Object]
The selection process in a nutshell ,[object Object],[object Object],[object Object],[object Object],[object Object]
Figure out what you’re trying to buy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Use-case checklist -- generalities ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Use-case checklist – traditional BI ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Use-case checklist – data mining ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SLA realism ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Short list constraints ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Filling out the shortlist ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A checklist for shortlists ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Proof-of-Concept basics ,[object Object],[object Object],[object Object],[object Object]
The three big POC challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
POC tips ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Evaluate and decide ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Further information Curt A. Monash, Ph.D. President, Monash Research Editor,  DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com

Contenu connexe

Tendances

Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntSteven Moy
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Microsoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered AnalyticsMicrosoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered AnalyticsJuan Alvarado
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...Hiram Fleitas León
 
Big Data Testing- Verify Structured and Unstructured Data Sets
Big Data Testing- Verify Structured and Unstructured Data SetsBig Data Testing- Verify Structured and Unstructured Data Sets
Big Data Testing- Verify Structured and Unstructured Data SetsBugRaptors
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
PowerShellForDBDevelopers
PowerShellForDBDevelopersPowerShellForDBDevelopers
PowerShellForDBDevelopersBryan Cafferky
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeVasu S
 
My Microsoft Business Intelligence Portfolio
My Microsoft Business Intelligence PortfolioMy Microsoft Business Intelligence Portfolio
My Microsoft Business Intelligence Portfoliomnkashama
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platformJesse Wang
 

Tendances (20)

Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure Hunt
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Microsoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered AnalyticsMicrosoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered Analytics
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
 
Power bi
Power biPower bi
Power bi
 
Data Lake
Data LakeData Lake
Data Lake
 
Big Data Testing- Verify Structured and Unstructured Data Sets
Big Data Testing- Verify Structured and Unstructured Data SetsBig Data Testing- Verify Structured and Unstructured Data Sets
Big Data Testing- Verify Structured and Unstructured Data Sets
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
PowerShellForDBDevelopers
PowerShellForDBDevelopersPowerShellForDBDevelopers
PowerShellForDBDevelopers
 
Data Federation
Data FederationData Federation
Data Federation
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data Lake
 
My Microsoft Business Intelligence Portfolio
My Microsoft Business Intelligence PortfolioMy Microsoft Business Intelligence Portfolio
My Microsoft Business Intelligence Portfolio
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platform
 

En vedette

ETL Practices for Better or Worse
ETL Practices for Better or WorseETL Practices for Better or Worse
ETL Practices for Better or WorseEric Sun
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Shirshanka Das
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Shirshanka Das
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Insights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured StreamingInsights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured StreamingDatabricks
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 Databricks
 

En vedette (8)

ETL Practices for Better or Worse
ETL Practices for Better or WorseETL Practices for Better or Worse
ETL Practices for Better or Worse
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Insights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured StreamingInsights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured Streaming
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Hbase hive pig
Hbase hive pigHbase hive pig
Hbase hive pig
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 

Similaire à How To Buy Data Warehouse

One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolutionmark madsen
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Peter Gfader
 
MongoDB and In-Memory Computing
MongoDB and In-Memory ComputingMongoDB and In-Memory Computing
MongoDB and In-Memory ComputingDylan Tong
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mininggulab sharma
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Conceptsraulmisir
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptDougSchoemaker
 
SQLBits VI - Improving database performance by removing the database
SQLBits VI - Improving database performance by removing the databaseSQLBits VI - Improving database performance by removing the database
SQLBits VI - Improving database performance by removing the databaseSimon Munro
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...Denodo
 
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS
 
What is Solution Architecture?
What is Solution Architecture?What is Solution Architecture?
What is Solution Architecture?Bogdan Bocse
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Storage Challenges for Production Machine Learning
Storage Challenges for Production Machine LearningStorage Challenges for Production Machine Learning
Storage Challenges for Production Machine LearningNisha Talagala
 
Short reference architecture
Short reference architectureShort reference architecture
Short reference architectureSteve Feldman
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.pptBsMath3rdsem
 

Similaire à How To Buy Data Warehouse (20)

One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008
 
MongoDB and In-Memory Computing
MongoDB and In-Memory ComputingMongoDB and In-Memory Computing
MongoDB and In-Memory Computing
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Concepts
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.ppt
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
SQLBits VI - Improving database performance by removing the database
SQLBits VI - Improving database performance by removing the databaseSQLBits VI - Improving database performance by removing the database
SQLBits VI - Improving database performance by removing the database
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
 
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
 
What is Solution Architecture?
What is Solution Architecture?What is Solution Architecture?
What is Solution Architecture?
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Week 5
Week 5Week 5
Week 5
 
Week 5
Week 5Week 5
Week 5
 
Msst 2019 v4
Msst 2019 v4Msst 2019 v4
Msst 2019 v4
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Storage Challenges for Production Machine Learning
Storage Challenges for Production Machine LearningStorage Challenges for Production Machine Learning
Storage Challenges for Production Machine Learning
 
Short reference architecture
Short reference architectureShort reference architecture
Short reference architecture
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 

Dernier

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Dernier (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

How To Buy Data Warehouse

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. Further information Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com

Notes de l'éditeur

  1. Slides 3-26 outline what you need to know about the sector to conduct any kind of selection process. They‘re not meant to be read separately, but rather just to illustrate the presentation. Slides 27-39 have tips for the process itself. They’re meant as reference take-aways. We’ll discuss them selectively as time permits.
  2. Disk speed dominates everything. The problem is this – disks simply don’t spin very fast. If they did, they’d fly off of the spindle or something. The very first disk drives, introduced in 1956 by IBM, rotated 1200 times per minute. Today’s top-end drives only spin 15000 times per minute. That’s a 12.5 fold increase in 40 years. Most other metrics of computer performance increase 12.5 fold every 7 years or so. That’s just Moore’s Law. A two-year doubling, which turns out to be more factual than other statements of the law, works out to an 8-fold increase in 6 years, or a 12-fold increase in 7. There’s just a huge, huge difference.
  3. It’s actually hard to get a single firm number for the difference between disk and RAM access times. Disk access times are well-known. They’re advertised a lot, for one thing. But RAM access times are harder. A big part of the problem is that they depend heavily on architecture; access isn’t access isn’t access. There are multiple levels of cache, for example. Another problem is that RAM isn’t RAM isn’t RAM. Anyhow, listed access times tend to be in the 5 to 7-and-a-half nanosecond range, so that’s what I’m going with. One thing we can compute is a very hard lower bound on disk random seek times. If a seek is random, than the average time is at least the time it takes the disk to spin physically around. And we know exactly what that is; it’s 2 milliseconds. There’s just no way random disk seeks will get any faster than that, except to the extent disk rotation resumes its creeping slow progress. “ Tiering” basically means “Use of Level 2 – i.e., on-processor – cache”
  4. I’ve been watching the DBMS industry – especially the relational vendors – work on performance for over 25 years now. And I’m I awe at what they’ve accomplished. It’s some of the finest engineering in the software industry. With OLTP performance largely a solved problem, most of that work for the past decade has been in the area of OLAP. And improving OLAP performance basically means decreasing OLAP I/O. Perhaps the most basic thing they try to do is minimize the amount of data returned. Since the end result is what the end result is, this means optimizing the amount returned at intermediate stages of a query execution process. That’s what cost-based optimizers are all about … Baked into the architecture of disk-centric DBMS is something even more basic; they try to minimize index accesses. Naively, if you’re selecting from a 2^30 th – i.e., 1 billion -- records, there might be 30 steps as you walk through the binary tree. By dividing indices into large pages, this is reduced – at the cost of a whole lot of sorting within the block at each step. Layered on are ever more special indexing structures. For example, if it seems clear that a certain join will be done frequently, an index can be built that essentially bakes in that join’s results. Of course, this also reduces the amount of data returned in the intermediate step, admittedly at the cost of index size. Anyhow, it’s a very important technique. And that’s not the only kind of precalculation. Preaggregation is at the heart of disk-centric MOLAP architectures. Materialized views bring MOLAP benefits to conventional relational processing. These are all more or less logical techniques, although the optimizer stuff is on the boundary between logical and physical. There also are approaches that are more purely physical. Most basically, much like the index situation,data is returned in pages. It turns out to be cheaper to always be wasteful and send a whole block of sequential data back than it is to send back only what is actually needed. Beyond that, efforts are made to understand what data will be requested together, and cluster it so that sequential reads can take the place of truly random I/O. And that leads to the most powerful solution of all – do everything in RAM!! If you always initialized by reading in the whole database, in principle you’re done with ALL your disk I/O for the day! Oh, there may be reasons to write things, such as the results to queries, but basically you’ve made your disk speed problems totally to away. There’s a price of course, mainly and most obviously in the RAM you need to buy, and probably the CPU driving that RAM. But by investing in one area, you’re making a big related problem go away – if, of course, you can afford all that silicon.
  5. This is the model for appliances. It’s also the model for software-only configurations that compete with appliances. Think IBM BCUs = Balanced Configuration Units, or various Oracle reference configurations. The pendulum shifts back and forth as to whether there are tight “recommended configurations” for non-appliance offerings. Row-based vendors are generally pickier about their hardware configurations than columnar ones.
  6. Kickfire is the only custom-chip-based vendor of note. Netezza’s FPGAs and PowerPC processors aren’t, technically, custom. But they’re definitely unusual. Oracle and DATAllegro (pre-Microsoft) like Infiniband. Other vendors like 10-gigabit Ethernet. Others just use lots and lots of 1-gigabit switches. Teradata, long proprietary, is now going in a couple of different networking directions.
  7. This slide is included at this point mainly for the golly-gee-whiz factor. 
  8. Columnar isn’t columnar isn’t columnar; each product is different. The same goes for row-based. Still, this categorization is the point from which to start.
  9. Oracle and SQL Server are single product meant to serve both OLTP and analytics. Any of the main versions of DB2 is something like that too. Sybase, however, separated it’s OLTP and analytic product lines in the mid-1990s.
  10. Even when you can make this stuff work at all, it’s hard. That’s a big reason why “disruptive” new analytic DBMS vendors have sprung up.
  11. The advantage of hash distribution is that if your join happens to involve the hash key, a lot of the work is already done for you. The disadvantage can be a bit of skew. The advantage usually wins out. Almost every vendor (Kognitio is an exception) encourages hash distribution. Oracle Exadata is an exception too, for different reasons.
  12. Fixed configurations – including but not limited to appliances – are more important in row-based MPP than in columnar MPP systems. Oracle Exadata, Teradata, and Netezza are the most visible examples, but another one is IBM’s BCUs.
  13. Sybase IQ is the granddaddy, but it’s not MPP. SAND is another old one, but it’s focused more on archiving now. Vertica is a quite successful recent start-up, with >10X the known customers of ParAccel (published or NDA). InfoBright and Kickfire are MySQL storage engines. Kickfire is also an appliance. Exasol is very memory-centric. So is ParAccel’s TPC-H submission. So is SAP Bi Accelerator, but unlike the others it’s not really a DBMS. MonetDB is open source.
  14. The big benefit of columnar is at the I/O bottleneck – you don’t have to bring back the whole row. But it also tends to make compression easier. Naïve columnar implementations are terrible at update/load. Any serious commercial product has done engineering work to get around that. For example, Vertica – which is probably the most open about its approach -- pretty much federates queries between disk and what almost amounts to a separate in-memory DBMS.
  15. I.e., OLTP system and data warehouse integrated Separate EDW (Enterprise Data Warehouse) Customer-facing data mart that hence requires OLTP-like uptime 100+ terabytes or so Great speed on terabyte-scale data sets at low per-terabyte TCO (counting user data).
  16. Here starts the how-to.
  17. Databases grow naturally, as more transactions are added over time. Cheaper data warehousing also encourages the retention of more detail, and the addition of new data sources. All three factors boost database size. Users can be either humans or other systems. (Both, in fact, are included in the definition of “user” on the Oracle price list.) Cheap data warehousing also leads to a desire for lower latency, often without clear consideration of the benefits of same.
  18. Nobody ever overestimates their need for storage. But people do sometimes overestimate their need for data immediacy.