Beyond relational: «neural» DBMS?

•

2 j'aime•127 vues

Neural networks and machine learning models show promise as alternatives to traditional index structures in database management systems. A learned index framework can extract weights from a trained TensorFlow model to generate efficient index structures in C++. A recursive model index builds a hierarchy of models, with each model selecting the next based on the key, to more accurately search the "last mile". Hybrid models use a ReLU neural net at the top layer and thousands of simple linear regression models at the bottom to balance performance and accuracy. While results are promising, learned indexes may not always be the best choice.

Technologie

BEYOND RELATIONAL:
«NEURAL» DBMS?
Roberto Reale
@ Italian Association for Machine Learning
10 Apr 2019

 F. Codd, E. (1970). A Relational
Model of Data for Large Shared
Data Banks. Commun. ACM. 13.
377-387.
 Kraska, T., Beutel, A., Chi, E.H., Dean,
J. and Polyzotis, N., (2017). The Case
for Learned Index Structures. arXiv
preprint arXiv:1712.01208.

RELATIONAL MODEL
Can be expressed in first-order
predicate logic
Data is represented as tuples,
grouped into relations
Abstraction from physical storage
model

INDEX STRUCTURES
Needed for efficient data access
B-Trees, Hash maps, Bloom filters, ...
Need tuning
General data structures, do not
take advantage of data patterns

ENTER MACHINE
LEARNING
Replacing core components of a
data management system through
learned models
Traditional indexes are already
models
For efficiency reasons it is common
not to index every single key of the
sorted records, rather only the key of
every n-th record
Using other types of models as
indexes can provide benefits

INDEXES ARE CDF
MODELS
An index is a model that takes a
key as an input and predicts the
position of the record
A model that predicts the position
given a key inside a sorted array
approximates the cumulative
distribution function
F(Key) is the estimated cumulative
distribution function for the data to
estimate the likelihood to observe
a key smaller or equal to the look-
up key

ISSUES...
Decision trees in general, are really
good in overfitting the data with a
few operations
A single neural net requires
significantly more space and CPU
time for the “last mile”
B-Trees are extremely cache- and
operation-efficient

THE LEARNING INDEX
FRAMEWORK (LIF)
Given a trained Tensorflow model,
LIF automatically extracts all
weights from the model and
generates efficient index structures
in C++
Designed for small models
No unnecessary overhead

THE RECURSIVE MODEL
INDEX
Challenge: accuracy for last-mile
search
We build a hierarchy of models
Each model takes the key as an
input and based on it picks
another model

THE RECURSIVE MODEL
INDEX, 2
We iteratively train each stage with
loss Lℓ
We separate model size and
complexity from execution cost
We effectively divide the space
into smaller sub-ranges to make it
easier to achieve the required “last
mile” accuracy

HYBRID MODELS
Top-layer: rectified linear unit (ReLU)
neural net
At the bottom: thousands of simple,
inexpensive linear regression models
Traditional B-Trees at the bottom if
the data is particularly hard to learn

DOES THIS STUFF
WORK?
Simple NNs can be efficiently
trained using stochastic gradient
descent
A closed form solution exists for
linear multi-variate models
The results are promising, but
“learned indexes” might not be the
best choice in every use case
A new way to think about indexing

Recommandé

5 Essential Machine Learning IdeasCarl Dawson

Disentangling ecological networks using graph embedding methodsMichiel Stock

PLANT DESIGN AND MANAGEMENT SYSTEMGanesh Buddha

50120140503017 2IAEME Publication

Lecture 1.pptxWaelGomaa15

SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans

IRJET- Enhanced Density Based Method for Clustering Data StreamIRJET Journal

Recommandé

5 Essential Machine Learning IdeasCarl Dawson

Disentangling ecological networks using graph embedding methodsMichiel Stock

PLANT DESIGN AND MANAGEMENT SYSTEMGanesh Buddha

50120140503017 2IAEME Publication

Lecture 1.pptxWaelGomaa15

SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans

IRJET- Enhanced Density Based Method for Clustering Data StreamIRJET Journal

Intake 37 ef1Mahmoud Ouf

Cache Conscious IndexesTata Consultancy Services

csedatabasemanagementsystemppt-170825044344.pdfSameerKhanPathan7

Database Management System pptOECLIB Odisha Electronics Control Library

DRESD In a Nutshell July07Marco Santambrogio

Intake 38 data access 4Mahmoud Ouf

Chapter1_C.docbutest

RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda

Database-management-system-dbms-ppt.pptxAnmolThakur67

An efficient data mining framework on hadoop using java persistence apiJoão Gabriel Lima

Abaqus_hdf5_interOpAnshuman Singh

Data and File Structure Lecture NotesFellowBuddy.com

DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTIONcscpconf

Decision Tree Clustering : A Columnstores Tuple Reconstructioncsandit

Decision tree clustering a columnstores tuple reconstructioncsandit

Towards a new hybrid approach for building documentoriented data warehIJECEIAES

Unit-I_dbms_TT_Final.pptxUnknownUnknown252665

Transforming data-centric eXtensible markup language into relational database...journalBEEI

JovianDATA MDX Engine Comad oct 22 2011Satya Ramachandran

«Дизайн продвинутых нереляционных схем для Big Data»Olga Lavrentieva

eInvoicing adoption in Italy & CEF projectsRoberto Reale

eProcurement governance: i nodi da sciogliereRoberto Reale

Contenu connexe

Similaire à Beyond relational: «neural» DBMS?

Intake 37 ef1Mahmoud Ouf

Cache Conscious IndexesTata Consultancy Services

csedatabasemanagementsystemppt-170825044344.pdfSameerKhanPathan7

Database Management System pptOECLIB Odisha Electronics Control Library

DRESD In a Nutshell July07Marco Santambrogio

Intake 38 data access 4Mahmoud Ouf

Chapter1_C.docbutest

RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda

Database-management-system-dbms-ppt.pptxAnmolThakur67

An efficient data mining framework on hadoop using java persistence apiJoão Gabriel Lima

Abaqus_hdf5_interOpAnshuman Singh

Data and File Structure Lecture NotesFellowBuddy.com

DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTIONcscpconf

Decision Tree Clustering : A Columnstores Tuple Reconstructioncsandit

Decision tree clustering a columnstores tuple reconstructioncsandit

Towards a new hybrid approach for building documentoriented data warehIJECEIAES

Unit-I_dbms_TT_Final.pptxUnknownUnknown252665

Transforming data-centric eXtensible markup language into relational database...journalBEEI

JovianDATA MDX Engine Comad oct 22 2011Satya Ramachandran

«Дизайн продвинутых нереляционных схем для Big Data»Olga Lavrentieva

Similaire à Beyond relational: «neural» DBMS? (20)

Intake 37 ef1

Cache Conscious Indexes

csedatabasemanagementsystemppt-170825044344.pdf

Database Management System ppt

DRESD In a Nutshell July07

Intake 38 data access 4

Chapter1_C.doc

RunPool: A Dynamic Pooling Layer for Convolution Neural Network

Database-management-system-dbms-ppt.pptx

An efficient data mining framework on hadoop using java persistence api

Abaqus_hdf5_interOp

Data and File Structure Lecture Notes

DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION

Decision Tree Clustering : A Columnstores Tuple Reconstruction

Decision tree clustering a columnstores tuple reconstruction

Towards a new hybrid approach for building documentoriented data wareh

Unit-I_dbms_TT_Final.pptx

Transforming data-centric eXtensible markup language into relational database...

JovianDATA MDX Engine Comad oct 22 2011

«Дизайн продвинутых нереляционных схем для Big Data»

Plus de Roberto Reale

eInvoicing adoption in Italy & CEF projectsRoberto Reale

eProcurement governance: i nodi da sciogliereRoberto Reale

Governing Information SecurityRoberto Reale

Società Civile: diritto di accesso e sicurezza in reteRoberto Reale

Tecnologie emergenti: opportunità, sfide, governanceRoberto Reale

Blockchain for BusinessRoberto Reale

Politically correct. Sentiment analysis of Italian political textsRoberto Reale

La Strategia per la Crescita Digitale 2014-2020Roberto Reale

Homo Digitalis: Metamorfosi dell'identitàRoberto Reale

The History of Technological Anxiety and the Future of Economic Growth: Is Th...Roberto Reale

All'ombra del Leviatano: Filesystem in UserspaceRoberto Reale

Fog and the City: an urbanist's perspectiveRoberto Reale

The Unbearable Lightness: Extending the Bash shellRoberto Reale

Plus de Roberto Reale (13)

eInvoicing adoption in Italy & CEF projects

eProcurement governance: i nodi da sciogliere

Governing Information Security

Società Civile: diritto di accesso e sicurezza in rete

Tecnologie emergenti: opportunità, sfide, governance

Blockchain for Business

Politically correct. Sentiment analysis of Italian political texts

La Strategia per la Crescita Digitale 2014-2020

Homo Digitalis: Metamorfosi dell'identità

The History of Technological Anxiety and the Future of Economic Growth: Is Th...

All'ombra del Leviatano: Filesystem in Userspace

Fog and the City: an urbanist's perspective

The Unbearable Lightness: Extending the Bash shell

Dernier

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Partners Life - Insurer Innovation Award 2024The Digital Insurer

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

How to convert PDF to text with Nanonetsnaman860154

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Slack Application Development 101 Slidespraypatel2

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Partners Life - Insurer Innovation Award 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Handwritten Text Recognition for manuscripts and early printed texts

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

CNv6 Instructor Chapter 6 Quality of Service

[2024]Digital Global Overview Report 2024 Meltwater.pdf

08448380779 Call Girls In Friends Colony Women Seeking Men

Unblocking The Main Thread Solving ANRs and Frozen Frames

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

How to convert PDF to text with Nanonets

Scaling API-first – The story of a global engineering organization

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Finology Group – Insurtech Innovation Award 2024

Automating Google Workspace (GWS) & more with Apps Script

Slack Application Development 101 Slides

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Injustice - Developers Among Us (SciFiDevCon 2024)

Beyond relational: «neural» DBMS?

1. BEYOND RELATIONAL: «NEURAL» DBMS? Roberto Reale @ Italian Association for Machine Learning 10 Apr 2019

2.  F. Codd, E. (1970). A Relational Model of Data for Large Shared Data Banks. Commun. ACM. 13. 377-387.  Kraska, T., Beutel, A., Chi, E.H., Dean, J. and Polyzotis, N., (2017). The Case for Learned Index Structures. arXiv preprint arXiv:1712.01208.

3. RELATIONAL MODEL Can be expressed in first-order predicate logic Data is represented as tuples, grouped into relations Abstraction from physical storage model

4. INDEX STRUCTURES Needed for efficient data access B-Trees, Hash maps, Bloom filters, ... Need tuning General data structures, do not take advantage of data patterns

5. ENTER MACHINE LEARNING Replacing core components of a data management system through learned models Traditional indexes are already models For efficiency reasons it is common not to index every single key of the sorted records, rather only the key of every n-th record Using other types of models as indexes can provide benefits

6. INDEXES ARE CDF MODELS An index is a model that takes a key as an input and predicts the position of the record A model that predicts the position given a key inside a sorted array approximates the cumulative distribution function F(Key) is the estimated cumulative distribution function for the data to estimate the likelihood to observe a key smaller or equal to the look- up key

7. ISSUES... Decision trees in general, are really good in overfitting the data with a few operations A single neural net requires significantly more space and CPU time for the “last mile” B-Trees are extremely cache- and operation-efficient

8. THE LEARNING INDEX FRAMEWORK (LIF) Given a trained Tensorflow model, LIF automatically extracts all weights from the model and generates efficient index structures in C++ Designed for small models No unnecessary overhead

9. THE RECURSIVE MODEL INDEX Challenge: accuracy for last-mile search We build a hierarchy of models Each model takes the key as an input and based on it picks another model

10. THE RECURSIVE MODEL INDEX, 2 We iteratively train each stage with loss Lℓ We separate model size and complexity from execution cost We effectively divide the space into smaller sub-ranges to make it easier to achieve the required “last mile” accuracy

11. HYBRID MODELS Top-layer: rectified linear unit (ReLU) neural net At the bottom: thousands of simple, inexpensive linear regression models Traditional B-Trees at the bottom if the data is particularly hard to learn

12. DOES THIS STUFF WORK? Simple NNs can be efficiently trained using stochastic gradient descent A closed form solution exists for linear multi-variate models The results are promising, but “learned indexes” might not be the best choice in every use case A new way to think about indexing

13. ROBERTO@REALE.ME