SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Online Index Recommendations for
High-Dimensional Databases Using Query Workloads

(Synopsis)
Abstract:
Usually users are interested in querying data over a relatively
small subset of the entire attribute set at a time. A potential solution is
to use lower dimensional indexes that accurately represent the user
access patterns. If the query pattern change, then the query response
using the physical database design that is developed based on a static
snapshot of the query workload may significantly degrade. To address
these issues, we introduce a parameterizable technique to recommend
indexes based on index types that are frequently used for highdimensional data sets and to dynamically adjust indexes as the
underlying query workload changes. We incorporate a query pattern
change detection mechanism to determine when the access patterns
have changed enough to warrant change in the physical database
design. By adjusting analysis parameters, we trade off analysis speed
against analysis resolution.
2 Introduction:
AN increasing number of database applications such as business data
warehouses and scientific data repositories deal with high-dimensional
data sets. As the number of dimensions/attributes and the overall size
of data sets increase, it becomes essential to efficiently retrieve
specific queried data from the database in order to effectively utilize
the database. Indexing support is needed to effectively prune out
significant portions of the data set that are not relevant for the
queries. Multidimensional indexing, dimensionality reduction, and
Relational Database Management System (RDBMS) index selection
tools all could be applied to the problem. However, for highdimensional data sets, each of these potential solutions has inherent
problems.

To

illustrate

these

problems,

consider

a

uniformly

distributed data set of 1,000,000 data objects with several hundred
attributes. Range queries are consistently executed over five of the
attributes. The query selectivity over each attribute is 0.1, so the
overall query selectivity is 1=105 (that is, the answer set contains
about 10 results). An ideal solution would allow us to read from the
disk only those pages that contain matching answers to the query. We
could build a multidimensional index over the data set so that we can
directly answer any query by only using the index. However, the
performance of multidimensional index structures is subject to
Bellman’s curse of dimensionality and rapidly degrades as the number
of dimensions increases. For the given example, such an index would
perform much worse than a sequential scan. Another possibility would
be to build an index over each single dimension. The effectiveness of
this approach is limited to the amount of search space that can be
pruned by a single dimension (in the example, the search space would
only be pruned to 100,000 objects).
High Dimensional Indexing:
A number of techniques have been introduced to address the
high-dimensional indexing problem such as the X-tree [5] and the GCtree [6]. Although these index structures have been shown to increase
the range of effective dimensionality, they still suffer performance
degradation at higher index dimensionality.

Feature Selection
Feature selection techniques are a subset of dimensionality reduction
targeted at finding a set of untransformed attributes that best
represent the overall data set. These techniques are also focused on
maximizing data energy or classification accuracy rather than query
response. As a result, selected features may have no overlap with
queried attributes.

Index Selection
The index selection problem has been identified as a variation of the
Knapsack Problem, and several papers proposed designs for index
recommendations based on optimization rules. These earlier designs
could not take advantage of modern database systems’ query
optimizer. Currently, almost every commercial RDBMS provides the
users with an index
recommendation tool based on a query workload and uses the query
optimizer to obtain cost estimates. A query workload is a set of SQL
data manipulation statements. The
query workload should be a good representative of the types of
queries that an application supports.
Automatic Index Selection
The ideas of having a database that can tune itself by automatically
creating new indexes as the queries arrive have been proposed. In a
cost model is used
to identify beneficial indexes and decide when to create or drop an
index at runtime. Costa and Lifschitz propose an agent-based database
architecture to deal with an
automatic index creation. Microsoft Research has proposed a physicaldesign alerter to identify when a modification to the physical design
could result in improved performance.
Literature survey:
Index Selection
Index Selection is a method of artificial selection in which several
useful traits are selected simultaneously. First, each trait that is going
to be selected is assigned a weight, the importance of the trait. I.e., if
you were selecting for both height and the darkness of the coat in
dogs, if height was more important to you, one would assign that a
higher weighting. For instance, heights weighting could be ten and
coat darkness' could be two. This weighting value is then multiplied by
the observed value in each individual animal and then the score for
each of the characteristics is summed for each individual. This result is
the index score and can be used to compare the worth of each
organism being selected. Therefore, only those with the highest index
score are selected for breeding via artificial selection.
This method has advantages over other methods of artificial selection,
such as tandem selection, in that you can select for traits
simultaneously rather than sequentially. Thereby, no useful traits are
being excluded from selection at any one time and so none will start to
reverse while you concentrate on improving another property of the
organism. However, its major disadvantage is that the weightings
assigned to each characteristic are inherently quite hard to calculate
precisely and so require some elements of trial and error before they
become optimal to the breeder.
Query Access pattern:
The advantage of using data access objects is the relatively simple and
rigorous separation between two important parts of an application
which can and should know almost nothing of each other, and which
can be expected to evolve frequently and independently. Changing
business logic can rely on the same DAO interface, while changes to
persistence logic do not affect DAO clients as long as the interface
remains correctly implemented.
In the specific context of the Java programming language, Data Access
Objects can be used to insulate an application from the particularly
numerous, complex and varied Java persistence technologies, which
could be JDBC, JDO, EJB CMP, Hibernate, or many others. Using Data
Access Objects means the underlying technology can be upgraded or
swapped without changing other parts of the application.
Existing System:
 Query response does not perform well if query
patterns change.
 Because it uses static query workload.
 Its performance may degrade if the database size
gets increased.
 Tradition feature selection technique may offer less
or no data pruning capability given query attributes.

Proposed System:
 We develop a flexible index selection frame work to
achieve static index selection and dynamic index
selection for high dimensional data.
 A control feedback technique is introduced for
measuring the performance.
 Through this a database could benefit from an index
change.
 The index selection minimizes the cost of the queries
in the work load.
 Online index selection is designed in the motivation if
the query pattern changes over time.
 By monitoring the query workload and detecting
when there is a change on the query pattern, able to
evolve good performance as query patterns evolve
Software requirements:
Hardware:
PROCESSOR

:

PENTIUM IV 2.6 GHz

RAM

:

512 MB DD RAM

MONITOR

:

15” COLOR

HARD DISK

:

20 GB

FLOPPY DRIVE

:

1.44 MB

CDDRIVE

:

LG 52X

Front End

:

J2EE(JSP)

Back End

:

MS SQL 2000

Tools Used

:

JFrameBuilder

Software:

Operating System

:

WindowsXP

Contenu connexe

Tendances

Providing healthcare as-a-service using fuzzy rule-based big data analytics i...
Providing healthcare as-a-service using fuzzy rule-based big data analytics i...Providing healthcare as-a-service using fuzzy rule-based big data analytics i...
Providing healthcare as-a-service using fuzzy rule-based big data analytics i...nexgentechnology
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for properIJDKP
 
Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...nalini manogaran
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET Journal
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...ijsc
 
An exploratory analysis on half hourly electricity load patterns leading to h...
An exploratory analysis on half hourly electricity load patterns leading to h...An exploratory analysis on half hourly electricity load patterns leading to h...
An exploratory analysis on half hourly electricity load patterns leading to h...acijjournal
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceVenkat Projects
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004Salford Systems
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisDmitry Grapov
 
Analysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetAnalysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetIRJET Journal
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldDmitry Grapov
 
IRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET Journal
 
Volume 14 issue 03 march 2014_ijcsms_march14_10_14_rahul
Volume 14  issue 03  march 2014_ijcsms_march14_10_14_rahulVolume 14  issue 03  march 2014_ijcsms_march14_10_14_rahul
Volume 14 issue 03 march 2014_ijcsms_march14_10_14_rahulDeepak Agarwal
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and ClusteringEng Teong Cheah
 

Tendances (20)

G046024851
G046024851G046024851
G046024851
 
Introductionedited
IntroductioneditedIntroductionedited
Introductionedited
 
Providing healthcare as-a-service using fuzzy rule-based big data analytics i...
Providing healthcare as-a-service using fuzzy rule-based big data analytics i...Providing healthcare as-a-service using fuzzy rule-based big data analytics i...
Providing healthcare as-a-service using fuzzy rule-based big data analytics i...
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Datamining
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for proper
 
Test data generation
Test data generationTest data generation
Test data generation
 
Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...
 
IJET-V2I6P32
IJET-V2I6P32IJET-V2I6P32
IJET-V2I6P32
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
An exploratory analysis on half hourly electricity load patterns leading to h...
An exploratory analysis on half hourly electricity load patterns leading to h...An exploratory analysis on half hourly electricity load patterns leading to h...
An exploratory analysis on half hourly electricity load patterns leading to h...
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performance
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Ijmet 10 02_050
Ijmet 10 02_050Ijmet 10 02_050
Ijmet 10 02_050
 
Analysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetAnalysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease Dataset
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
IRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET- Probability based Missing Value Imputation Method and its Analysis
 
Volume 14 issue 03 march 2014_ijcsms_march14_10_14_rahul
Volume 14  issue 03  march 2014_ijcsms_march14_10_14_rahulVolume 14  issue 03  march 2014_ijcsms_march14_10_14_rahul
Volume 14 issue 03 march 2014_ijcsms_march14_10_14_rahul
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 

En vedette

Novel secure communication protocol basepaper
Novel secure communication protocol basepaperNovel secure communication protocol basepaper
Novel secure communication protocol basepaperMumbai Academisc
 
Non ieee dot net projects list
Non  ieee dot net projects list Non  ieee dot net projects list
Non ieee dot net projects list Mumbai Academisc
 
Ieee 2014 java projects list
Ieee 2014 java projects list Ieee 2014 java projects list
Ieee 2014 java projects list Mumbai Academisc
 
Modeling and automated containment of worms (synopsis)
Modeling and automated containment of worms (synopsis)Modeling and automated containment of worms (synopsis)
Modeling and automated containment of worms (synopsis)Mumbai Academisc
 
Clustering and sequential pattern mining of online collaborative learning dat...
Clustering and sequential pattern mining of online collaborative learning dat...Clustering and sequential pattern mining of online collaborative learning dat...
Clustering and sequential pattern mining of online collaborative learning dat...Mumbai Academisc
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...Mumbai Academisc
 
Ieee 2013 java projects list
Ieee 2013 java projects list Ieee 2013 java projects list
Ieee 2013 java projects list Mumbai Academisc
 

En vedette (8)

Novel secure communication protocol basepaper
Novel secure communication protocol basepaperNovel secure communication protocol basepaper
Novel secure communication protocol basepaper
 
Non ieee dot net projects list
Non  ieee dot net projects list Non  ieee dot net projects list
Non ieee dot net projects list
 
Ieee 2014 java projects list
Ieee 2014 java projects list Ieee 2014 java projects list
Ieee 2014 java projects list
 
Jdbc
JdbcJdbc
Jdbc
 
Modeling and automated containment of worms (synopsis)
Modeling and automated containment of worms (synopsis)Modeling and automated containment of worms (synopsis)
Modeling and automated containment of worms (synopsis)
 
Clustering and sequential pattern mining of online collaborative learning dat...
Clustering and sequential pattern mining of online collaborative learning dat...Clustering and sequential pattern mining of online collaborative learning dat...
Clustering and sequential pattern mining of online collaborative learning dat...
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...
 
Ieee 2013 java projects list
Ieee 2013 java projects list Ieee 2013 java projects list
Ieee 2013 java projects list
 

Similaire à Online index recommendations for high dimensional databases using query workloads (synopsis)

2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )SBGC
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Mumbai Academisc
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code executionAlexander Decker
 
Elimination of data redundancy before persisting into dbms using svm classifi...
Elimination of data redundancy before persisting into dbms using svm classifi...Elimination of data redundancy before persisting into dbms using svm classifi...
Elimination of data redundancy before persisting into dbms using svm classifi...nalini manogaran
 
Query aware determinization of uncertain objects
Query aware determinization of uncertain objectsQuery aware determinization of uncertain objects
Query aware determinization of uncertain objectsSoftroniics india
 
Power Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage SystemPower Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage Systemijcnes
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseWhat is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseAlireza Kamrani
 
User Preferences Based Recommendation System for Services using Mapreduce App...
User Preferences Based Recommendation System for Services using Mapreduce App...User Preferences Based Recommendation System for Services using Mapreduce App...
User Preferences Based Recommendation System for Services using Mapreduce App...IJMTST Journal
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstracttsysglobalsolutions
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
Predictive job scheduling in a connection limited system using parallel genet...
Predictive job scheduling in a connection limited system using parallel genet...Predictive job scheduling in a connection limited system using parallel genet...
Predictive job scheduling in a connection limited system using parallel genet...Mumbai Academisc
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Context sensitive indexes for performance optimization of sql queries in mult...
Context sensitive indexes for performance optimization of sql queries in mult...Context sensitive indexes for performance optimization of sql queries in mult...
Context sensitive indexes for performance optimization of sql queries in mult...avinash varma sagi
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data miningUjjawal
 
QUERY OPTIMIZATION FOR BIG DATA ANALYTICS
QUERY OPTIMIZATION FOR BIG DATA ANALYTICSQUERY OPTIMIZATION FOR BIG DATA ANALYTICS
QUERY OPTIMIZATION FOR BIG DATA ANALYTICSijcsit
 

Similaire à Online index recommendations for high dimensional databases using query workloads (synopsis) (20)

2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
Elimination of data redundancy before persisting into dbms using svm classifi...
Elimination of data redundancy before persisting into dbms using svm classifi...Elimination of data redundancy before persisting into dbms using svm classifi...
Elimination of data redundancy before persisting into dbms using svm classifi...
 
Query aware determinization of uncertain objects
Query aware determinization of uncertain objectsQuery aware determinization of uncertain objects
Query aware determinization of uncertain objects
 
Power Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage SystemPower Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage System
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseWhat is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of database
 
User Preferences Based Recommendation System for Services using Mapreduce App...
User Preferences Based Recommendation System for Services using Mapreduce App...User Preferences Based Recommendation System for Services using Mapreduce App...
User Preferences Based Recommendation System for Services using Mapreduce App...
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Predictive job scheduling in a connection limited system using parallel genet...
Predictive job scheduling in a connection limited system using parallel genet...Predictive job scheduling in a connection limited system using parallel genet...
Predictive job scheduling in a connection limited system using parallel genet...
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Context sensitive indexes for performance optimization of sql queries in mult...
Context sensitive indexes for performance optimization of sql queries in mult...Context sensitive indexes for performance optimization of sql queries in mult...
Context sensitive indexes for performance optimization of sql queries in mult...
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
QUERY OPTIMIZATION FOR BIG DATA ANALYTICS
QUERY OPTIMIZATION FOR BIG DATA ANALYTICSQUERY OPTIMIZATION FOR BIG DATA ANALYTICS
QUERY OPTIMIZATION FOR BIG DATA ANALYTICS
 
Query Optimization for Big Data Analytics
Query Optimization for Big Data AnalyticsQuery Optimization for Big Data Analytics
Query Optimization for Big Data Analytics
 

Plus de Mumbai Academisc

Plus de Mumbai Academisc (20)

Non ieee java projects list
Non  ieee java projects list Non  ieee java projects list
Non ieee java projects list
 
Ieee java projects list
Ieee java projects list Ieee java projects list
Ieee java projects list
 
Ieee 2014 dot net projects list
Ieee 2014 dot net projects list Ieee 2014 dot net projects list
Ieee 2014 dot net projects list
 
Ieee 2013 dot net projects list
Ieee 2013 dot net projects listIeee 2013 dot net projects list
Ieee 2013 dot net projects list
 
Ieee 2012 dot net projects list
Ieee 2012 dot net projects listIeee 2012 dot net projects list
Ieee 2012 dot net projects list
 
Spring ppt
Spring pptSpring ppt
Spring ppt
 
Ejb notes
Ejb notesEjb notes
Ejb notes
 
Java web programming
Java web programmingJava web programming
Java web programming
 
Java programming-examples
Java programming-examplesJava programming-examples
Java programming-examples
 
Hibernate tutorial
Hibernate tutorialHibernate tutorial
Hibernate tutorial
 
J2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsJ2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai Academics
 
Web based development
Web based developmentWeb based development
Web based development
 
Java tutorial part 4
Java tutorial part 4Java tutorial part 4
Java tutorial part 4
 
Java tutorial part 3
Java tutorial part 3Java tutorial part 3
Java tutorial part 3
 
Java tutorial part 2
Java tutorial part 2Java tutorial part 2
Java tutorial part 2
 
Engineering
EngineeringEngineering
Engineering
 
Jsp
JspJsp
Jsp
 
Project list
Project listProject list
Project list
 
Personal authentication using 3 d finger geometry (synopsis)
Personal authentication using 3 d finger geometry (synopsis)Personal authentication using 3 d finger geometry (synopsis)
Personal authentication using 3 d finger geometry (synopsis)
 
Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...
 

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Online index recommendations for high dimensional databases using query workloads (synopsis)

  • 1. Online Index Recommendations for High-Dimensional Databases Using Query Workloads (Synopsis)
  • 2. Abstract: Usually users are interested in querying data over a relatively small subset of the entire attribute set at a time. A potential solution is to use lower dimensional indexes that accurately represent the user access patterns. If the query pattern change, then the query response using the physical database design that is developed based on a static snapshot of the query workload may significantly degrade. To address these issues, we introduce a parameterizable technique to recommend indexes based on index types that are frequently used for highdimensional data sets and to dynamically adjust indexes as the underlying query workload changes. We incorporate a query pattern change detection mechanism to determine when the access patterns have changed enough to warrant change in the physical database design. By adjusting analysis parameters, we trade off analysis speed against analysis resolution.
  • 3. 2 Introduction: AN increasing number of database applications such as business data warehouses and scientific data repositories deal with high-dimensional data sets. As the number of dimensions/attributes and the overall size of data sets increase, it becomes essential to efficiently retrieve specific queried data from the database in order to effectively utilize the database. Indexing support is needed to effectively prune out significant portions of the data set that are not relevant for the queries. Multidimensional indexing, dimensionality reduction, and Relational Database Management System (RDBMS) index selection tools all could be applied to the problem. However, for highdimensional data sets, each of these potential solutions has inherent problems. To illustrate these problems, consider a uniformly distributed data set of 1,000,000 data objects with several hundred attributes. Range queries are consistently executed over five of the attributes. The query selectivity over each attribute is 0.1, so the overall query selectivity is 1=105 (that is, the answer set contains about 10 results). An ideal solution would allow us to read from the disk only those pages that contain matching answers to the query. We could build a multidimensional index over the data set so that we can directly answer any query by only using the index. However, the performance of multidimensional index structures is subject to Bellman’s curse of dimensionality and rapidly degrades as the number of dimensions increases. For the given example, such an index would perform much worse than a sequential scan. Another possibility would be to build an index over each single dimension. The effectiveness of this approach is limited to the amount of search space that can be pruned by a single dimension (in the example, the search space would only be pruned to 100,000 objects).
  • 4. High Dimensional Indexing: A number of techniques have been introduced to address the high-dimensional indexing problem such as the X-tree [5] and the GCtree [6]. Although these index structures have been shown to increase the range of effective dimensionality, they still suffer performance degradation at higher index dimensionality. Feature Selection Feature selection techniques are a subset of dimensionality reduction targeted at finding a set of untransformed attributes that best represent the overall data set. These techniques are also focused on maximizing data energy or classification accuracy rather than query response. As a result, selected features may have no overlap with queried attributes. Index Selection The index selection problem has been identified as a variation of the Knapsack Problem, and several papers proposed designs for index recommendations based on optimization rules. These earlier designs could not take advantage of modern database systems’ query optimizer. Currently, almost every commercial RDBMS provides the users with an index recommendation tool based on a query workload and uses the query optimizer to obtain cost estimates. A query workload is a set of SQL data manipulation statements. The query workload should be a good representative of the types of queries that an application supports.
  • 5. Automatic Index Selection The ideas of having a database that can tune itself by automatically creating new indexes as the queries arrive have been proposed. In a cost model is used to identify beneficial indexes and decide when to create or drop an index at runtime. Costa and Lifschitz propose an agent-based database architecture to deal with an automatic index creation. Microsoft Research has proposed a physicaldesign alerter to identify when a modification to the physical design could result in improved performance. Literature survey: Index Selection Index Selection is a method of artificial selection in which several useful traits are selected simultaneously. First, each trait that is going to be selected is assigned a weight, the importance of the trait. I.e., if you were selecting for both height and the darkness of the coat in dogs, if height was more important to you, one would assign that a higher weighting. For instance, heights weighting could be ten and coat darkness' could be two. This weighting value is then multiplied by the observed value in each individual animal and then the score for each of the characteristics is summed for each individual. This result is the index score and can be used to compare the worth of each organism being selected. Therefore, only those with the highest index score are selected for breeding via artificial selection. This method has advantages over other methods of artificial selection, such as tandem selection, in that you can select for traits simultaneously rather than sequentially. Thereby, no useful traits are
  • 6. being excluded from selection at any one time and so none will start to reverse while you concentrate on improving another property of the organism. However, its major disadvantage is that the weightings assigned to each characteristic are inherently quite hard to calculate precisely and so require some elements of trial and error before they become optimal to the breeder. Query Access pattern: The advantage of using data access objects is the relatively simple and rigorous separation between two important parts of an application which can and should know almost nothing of each other, and which can be expected to evolve frequently and independently. Changing business logic can rely on the same DAO interface, while changes to persistence logic do not affect DAO clients as long as the interface remains correctly implemented. In the specific context of the Java programming language, Data Access Objects can be used to insulate an application from the particularly numerous, complex and varied Java persistence technologies, which could be JDBC, JDO, EJB CMP, Hibernate, or many others. Using Data Access Objects means the underlying technology can be upgraded or swapped without changing other parts of the application.
  • 7. Existing System:  Query response does not perform well if query patterns change.  Because it uses static query workload.  Its performance may degrade if the database size gets increased.  Tradition feature selection technique may offer less or no data pruning capability given query attributes. Proposed System:  We develop a flexible index selection frame work to achieve static index selection and dynamic index selection for high dimensional data.  A control feedback technique is introduced for measuring the performance.  Through this a database could benefit from an index change.  The index selection minimizes the cost of the queries in the work load.  Online index selection is designed in the motivation if the query pattern changes over time.  By monitoring the query workload and detecting when there is a change on the query pattern, able to evolve good performance as query patterns evolve
  • 8. Software requirements: Hardware: PROCESSOR : PENTIUM IV 2.6 GHz RAM : 512 MB DD RAM MONITOR : 15” COLOR HARD DISK : 20 GB FLOPPY DRIVE : 1.44 MB CDDRIVE : LG 52X Front End : J2EE(JSP) Back End : MS SQL 2000 Tools Used : JFrameBuilder Software: Operating System : WindowsXP