SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Automated Schema Design for
NoSQL Databases
Michael J. Mior
Supervised by Kenneth Salem
Schema optimization
For a given workload, construct the optimal
physical schema to answer queries in the
workload for a given storage budget
What is NoSQL?
● Not SQL?
● NoACID?
● Not only SQL
Types of NoSQL Databases
● Document stores
● Key-value stores
● Graph databases
● …
● Wide column stores
Cassandra data model
Cassandra data model
Queries specify
A set of row keys
and either
A list of column keys
or
A prefix of the column keys with a range
Cassandra schema design
Physical schema
POIsWithHotels
POI17
Hotel3
“Holiday Inn”
…
…
No single obvious mapping from the conceptual schema
Conceptual schema
Hotel
PointOfInterest
Near
Hotel8
“Motel 6”
Cassandra schema design
POIsWithHotels
POI17
Hotel3
“Holiday Inn”
…
…
HotelsWithPOIs
Hotel3
POI17
“Liberty Bell”
…
…
● Multiple choices of logical and
physical schema are possible
for a given conceptual schema
● Different choices are more
suitable for different queries
● Harder to add additional
structures in the future
Cassandra schema design
POIs
POI17
Name
“Liberty Bell”
…
…
Hotels
Hotel3
Name
“Holiday Inn”
…
…
HotelToPOI
Hotel3
POI17
(null)
…
…
POIToHotel
POI17
Hotel3
(null)
…
…
Cassandra schema design
POIToHotel
POI17
Hotel3
“Holiday Inn”
…
…
HotelToPOI
Hotel3
POI17
“Liberty Bell”
…
…
HotelToPOI
HotelID
POIID
(null)
…
…
POIToHotel
POIID
HotelID
(null)
…
…
Challenges
● Storage costs
● View maintenance
● Transactions
● Data distribution
Relational schema design
● Commercial database vendors have existing tools
○ Microsoft AutoAdmin for SQL Server
○ DB2 Design Advisor
○ Oracle SQL Tuning Advisor
● These tools don’t solve all the problems for NoSQL databases
● Usually an existing schema is assumed, and tools suggest secondary
indices and materialized views
Relational schema design
Conceptual schema Logical schema Physical schema
Hotel
CREATE TABLE Hotel(…);
CREATE TABLE POI(…);
CREATE TABLE HotelToPOI(…,
FOREIGN KEY (HotelID)
REFERENCES Hotel(HotelID)
FOREIGN KEY (POOID)
REFERENCES POI(POIID)
);
CREATE MATERIALIZED VIEW
POIsWithHotels AS SELECT
POI.*, Hotel.* FROM
POI, HotelToPOI, Hotel
WHERE HotelToPOI.HotelID
= Hotel.HotelID
AND HotelToPOI.POIID
= POI.POIID;
Mapping from conceptual to logical schema is straightforward
Near
PointOfInterest
Cassandra schema design
Physical schema
POIsWithHotels
POI17
Hotel3
“Holiday Inn”
…
…
No single obvious mapping from the conceptual schema
Conceptual schema
Hotel
Near
Hotel8
“Motel 6”
PointOfInterest
Abstract example
Query paths
Get all points of interests near
hotels a guest has stayed at
SELECT Name FROM POI WHERE
POI.HotelToPOI
.Hotel.Room.Reservation
.Guest.GuestID = ?
Guest Reservation
Room
Hotel
Point of
Interest
Index paths
● Queries can “skip” entities
on a path using an index
● Indices to the final entity
are equivalent to
materialized views
● Schema design is now the
selection of these indices
Guest Reservation
Room
Hotel
Point of
Interest
Problem definition
Input:
● ER diagram with entity counts and cardinalities
● Weighted queries over the diagram
● Storage constraint
● Cost model for target DB
Output:
● Suggested index configuration
● Plans for executing each query
Proposed solution
1. Enumerate possible useful indices for all queries
2. Determine the benefit of a given index for each query
3. Select the optimal indices for a given space constraint
Future Work
● Retargeting to other NoSQL DBs
● Automatic implementation of query plans
● Handling workloads which include updates
● Richer query language
● Exploitation of DBMS-specific features
Summary
● Schema design in NoSQL databases
presents unique challenges
● A workload-driven approach is necessary
● Conceptual schemata and queries are viable
abstractions for NoSQL schema design tools
Query language
SELECT [attributes] FROM [entity]
WHERE [path](=|<|<=|>|>=) [value] AND …
ORDER BY [path] LIMIT [count]
Conjunctive queries with simple predicates and ordering
Higher level queries can be composed by the application
Solution
Index
Enumerator
Query Planner
Index Selection
Index benefit
analysis
Query Plans
Selected IndicesWorkload
Input Output
Implementation
● Focus on Cassandra as target system
● Simple cost model for read-only in-memory workloads
● DBMS-independent index enumerator and query planner
● Index selection ILP implemented with GLPK
Evaluation - TBD
● Re-implement existing workloads in the target DBMS (e.
g. RUBiS, RUBBoS)
● Develop new benchmarks
● Comparison with a human DBA
● Random ER diagrams/queries with realistic properties

Contenu connexe

Tendances

Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMongoDB
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshootingTarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshootingJovan Popovic
 
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)MongoDB
 
ClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @SinaClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @SinaJack Gao
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases MongoDB
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabaseTung Nguyen Thanh
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
How to upgrade like a boss to MySQL 8.0 - PLE19
How to upgrade like a boss to MySQL 8.0 -  PLE19How to upgrade like a boss to MySQL 8.0 -  PLE19
How to upgrade like a boss to MySQL 8.0 - PLE19Alkin Tezuysal
 
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Amazon Web Services
 
MongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerMongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerWiredTiger
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingAraf Karsh Hamid
 
Indexing and Query Optimization
Indexing and Query OptimizationIndexing and Query Optimization
Indexing and Query OptimizationMongoDB
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리Junyi Song
 
Fail-Safe Cluster for FirebirdSQL and something more
Fail-Safe Cluster for FirebirdSQL and something moreFail-Safe Cluster for FirebirdSQL and something more
Fail-Safe Cluster for FirebirdSQL and something moreAlexey Kovyazin
 

Tendances (20)

Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Document Database
Document DatabaseDocument Database
Document Database
 
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshootingTarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
 
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
 
ClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @SinaClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @Sina
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL Database
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
How to upgrade like a boss to MySQL 8.0 - PLE19
How to upgrade like a boss to MySQL 8.0 -  PLE19How to upgrade like a boss to MySQL 8.0 -  PLE19
How to upgrade like a boss to MySQL 8.0 - PLE19
 
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
 
MongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerMongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTiger
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
Indexing and Query Optimization
Indexing and Query OptimizationIndexing and Query Optimization
Indexing and Query Optimization
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
Fail-Safe Cluster for FirebirdSQL and something more
Fail-Safe Cluster for FirebirdSQL and something moreFail-Safe Cluster for FirebirdSQL and something more
Fail-Safe Cluster for FirebirdSQL and something more
 

En vedette

NoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of ViewNoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of ViewAlex Esterkin
 
NoSQL, Growing up at Oracle
NoSQL, Growing up at OracleNoSQL, Growing up at Oracle
NoSQL, Growing up at OracleDATAVERSITY
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015NoSQLmatters
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutesKaren Lopez
 
NoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsNoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsMichael Mior
 
NoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data ModelersNoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data ModelersKaren Lopez
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Sid Anand
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
 
Non-Relational Databases & Key/Value Stores
Non-Relational Databases & Key/Value StoresNon-Relational Databases & Key/Value Stores
Non-Relational Databases & Key/Value StoresJoël Perras
 
Persistence Smoothie: Blending SQL and NoSQL (RubyNation Edition)
Persistence  Smoothie: Blending SQL and NoSQL (RubyNation Edition)Persistence  Smoothie: Blending SQL and NoSQL (RubyNation Edition)
Persistence Smoothie: Blending SQL and NoSQL (RubyNation Edition)Michael Bleigh
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Alexandre Morgaut
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2Fabio Fumarola
 
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical ApproachSlides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical ApproachDATAVERSITY
 
Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesArangoDB Database
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
The 7 Key Ingredients of Web Content and Experience Management
The 7 Key Ingredients of Web Content and Experience ManagementThe 7 Key Ingredients of Web Content and Experience Management
The 7 Key Ingredients of Web Content and Experience Managementrivetlogic
 
NoSQL Design Considerations and Lessons Learned
NoSQL Design Considerations and Lessons LearnedNoSQL Design Considerations and Lessons Learned
NoSQL Design Considerations and Lessons Learnedrivetlogic
 

En vedette (20)

NoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of ViewNoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
 
NoSQL, Growing up at Oracle
NoSQL, Growing up at OracleNoSQL, Growing up at Oracle
NoSQL, Growing up at Oracle
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
NoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsNoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL Applications
 
NoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data ModelersNoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data Modelers
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
Non-Relational Databases & Key/Value Stores
Non-Relational Databases & Key/Value StoresNon-Relational Databases & Key/Value Stores
Non-Relational Databases & Key/Value Stores
 
Persistence Smoothie: Blending SQL and NoSQL (RubyNation Edition)
Persistence  Smoothie: Blending SQL and NoSQL (RubyNation Edition)Persistence  Smoothie: Blending SQL and NoSQL (RubyNation Edition)
Persistence Smoothie: Blending SQL and NoSQL (RubyNation Edition)
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
 
NoSQL meets Microservices
NoSQL meets MicroservicesNoSQL meets Microservices
NoSQL meets Microservices
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Real-World NoSQL Schema Design
Real-World NoSQL Schema DesignReal-World NoSQL Schema Design
Real-World NoSQL Schema Design
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical ApproachSlides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
 
Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databases
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
The 7 Key Ingredients of Web Content and Experience Management
The 7 Key Ingredients of Web Content and Experience ManagementThe 7 Key Ingredients of Web Content and Experience Management
The 7 Key Ingredients of Web Content and Experience Management
 
NoSQL Design Considerations and Lessons Learned
NoSQL Design Considerations and Lessons LearnedNoSQL Design Considerations and Lessons Learned
NoSQL Design Considerations and Lessons Learned
 

Similaire à Automated Schema Design for NoSQL Databases

Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDBMongoDB
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 
Building your First MEAN App
Building your First MEAN AppBuilding your First MEAN App
Building your First MEAN AppMongoDB
 
FULL stack -> MEAN stack
FULL stack -> MEAN stackFULL stack -> MEAN stack
FULL stack -> MEAN stackAshok Raj
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?Brent Ozar
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresqlZaid Shabbir
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysKoneksys
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernates4al_com
 
Physical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data SystemsPhysical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data SystemsMichael Mior
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelGarindra Prahandono
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaJose Mº Muñoz
 

Similaire à Automated Schema Design for NoSQL Databases (20)

Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Building your First MEAN App
Building your First MEAN AppBuilding your First MEAN App
Building your First MEAN App
 
FULL stack -> MEAN stack
FULL stack -> MEAN stackFULL stack -> MEAN stack
FULL stack -> MEAN stack
 
Database.pdf
Database.pdfDatabase.pdf
Database.pdf
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
Developers Guide to Cosmos DB
Developers Guide to Cosmos DBDevelopers Guide to Cosmos DB
Developers Guide to Cosmos DB
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By Koneksys
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
MongoDB.pdf
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernate
 
Physical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data SystemsPhysical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data Systems
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Presentation
PresentationPresentation
Presentation
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 
SQL - RDBMS Concepts
SQL - RDBMS ConceptsSQL - RDBMS Concepts
SQL - RDBMS Concepts
 

Dernier

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Automated Schema Design for NoSQL Databases

  • 1. Automated Schema Design for NoSQL Databases Michael J. Mior Supervised by Kenneth Salem
  • 2. Schema optimization For a given workload, construct the optimal physical schema to answer queries in the workload for a given storage budget
  • 3. What is NoSQL? ● Not SQL? ● NoACID? ● Not only SQL
  • 4. Types of NoSQL Databases ● Document stores ● Key-value stores ● Graph databases ● … ● Wide column stores
  • 6. Cassandra data model Queries specify A set of row keys and either A list of column keys or A prefix of the column keys with a range
  • 7. Cassandra schema design Physical schema POIsWithHotels POI17 Hotel3 “Holiday Inn” … … No single obvious mapping from the conceptual schema Conceptual schema Hotel PointOfInterest Near Hotel8 “Motel 6”
  • 8. Cassandra schema design POIsWithHotels POI17 Hotel3 “Holiday Inn” … … HotelsWithPOIs Hotel3 POI17 “Liberty Bell” … … ● Multiple choices of logical and physical schema are possible for a given conceptual schema ● Different choices are more suitable for different queries ● Harder to add additional structures in the future
  • 9. Cassandra schema design POIs POI17 Name “Liberty Bell” … … Hotels Hotel3 Name “Holiday Inn” … … HotelToPOI Hotel3 POI17 (null) … … POIToHotel POI17 Hotel3 (null) … …
  • 10. Cassandra schema design POIToHotel POI17 Hotel3 “Holiday Inn” … … HotelToPOI Hotel3 POI17 “Liberty Bell” … … HotelToPOI HotelID POIID (null) … … POIToHotel POIID HotelID (null) … …
  • 11. Challenges ● Storage costs ● View maintenance ● Transactions ● Data distribution
  • 12. Relational schema design ● Commercial database vendors have existing tools ○ Microsoft AutoAdmin for SQL Server ○ DB2 Design Advisor ○ Oracle SQL Tuning Advisor ● These tools don’t solve all the problems for NoSQL databases ● Usually an existing schema is assumed, and tools suggest secondary indices and materialized views
  • 13. Relational schema design Conceptual schema Logical schema Physical schema Hotel CREATE TABLE Hotel(…); CREATE TABLE POI(…); CREATE TABLE HotelToPOI(…, FOREIGN KEY (HotelID) REFERENCES Hotel(HotelID) FOREIGN KEY (POOID) REFERENCES POI(POIID) ); CREATE MATERIALIZED VIEW POIsWithHotels AS SELECT POI.*, Hotel.* FROM POI, HotelToPOI, Hotel WHERE HotelToPOI.HotelID = Hotel.HotelID AND HotelToPOI.POIID = POI.POIID; Mapping from conceptual to logical schema is straightforward Near PointOfInterest
  • 14. Cassandra schema design Physical schema POIsWithHotels POI17 Hotel3 “Holiday Inn” … … No single obvious mapping from the conceptual schema Conceptual schema Hotel Near Hotel8 “Motel 6” PointOfInterest
  • 16. Query paths Get all points of interests near hotels a guest has stayed at SELECT Name FROM POI WHERE POI.HotelToPOI .Hotel.Room.Reservation .Guest.GuestID = ? Guest Reservation Room Hotel Point of Interest
  • 17. Index paths ● Queries can “skip” entities on a path using an index ● Indices to the final entity are equivalent to materialized views ● Schema design is now the selection of these indices Guest Reservation Room Hotel Point of Interest
  • 18. Problem definition Input: ● ER diagram with entity counts and cardinalities ● Weighted queries over the diagram ● Storage constraint ● Cost model for target DB Output: ● Suggested index configuration ● Plans for executing each query
  • 19. Proposed solution 1. Enumerate possible useful indices for all queries 2. Determine the benefit of a given index for each query 3. Select the optimal indices for a given space constraint
  • 20. Future Work ● Retargeting to other NoSQL DBs ● Automatic implementation of query plans ● Handling workloads which include updates ● Richer query language ● Exploitation of DBMS-specific features
  • 21. Summary ● Schema design in NoSQL databases presents unique challenges ● A workload-driven approach is necessary ● Conceptual schemata and queries are viable abstractions for NoSQL schema design tools
  • 22. Query language SELECT [attributes] FROM [entity] WHERE [path](=|<|<=|>|>=) [value] AND … ORDER BY [path] LIMIT [count] Conjunctive queries with simple predicates and ordering Higher level queries can be composed by the application
  • 23. Solution Index Enumerator Query Planner Index Selection Index benefit analysis Query Plans Selected IndicesWorkload Input Output
  • 24. Implementation ● Focus on Cassandra as target system ● Simple cost model for read-only in-memory workloads ● DBMS-independent index enumerator and query planner ● Index selection ILP implemented with GLPK
  • 25. Evaluation - TBD ● Re-implement existing workloads in the target DBMS (e. g. RUBiS, RUBBoS) ● Develop new benchmarks ● Comparison with a human DBA ● Random ER diagrams/queries with realistic properties