SlideShare une entreprise Scribd logo
1  sur  8
For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or
                                     Call Us On 7385665306



 Adaptive Cluster Distance Bounding for High
                       Dimensional Indexing+
Abstract:

We consider approaches for similarity search in correlated, high-dimensional data
sets, which are derived within a clustering framework. We note that indexing by
“vector approximation” (VA-File), which was proposed as a technique to combat the
“Curse of Dimensionality,” employs scalar quantization, and hence necessarily
ignores dependencies across dimensions, which represents a source of sub
optimality. Clustering, on the other hand, exploits inter dimensional correlations
and is thus a more compact representation of the data set. However, existing
methods to prune irrelevant clusters are based on bounding hyperspheres and/or
bounding rectangles, whose lack of tightness compromises their efficiency in exact
nearest neighbor search. We propose a new cluster-adaptive distance bound based
on separating hyperplane boundaries of Voronoi clusters to complement our cluster
based index. This bound enables efficient spatial filtering, with a relatively small
preprocessing storage overhead and is applicable to euclidean and Mahalanobis
similarity measures. Experiments in exact nearest-neighbor set retrieval, conducted
on real data sets, show that our indexing method is scalable with data set size and
data dimensionality and outperforms several recently proposed indexes. Relative to
the VA-File, over a wide range of quantization resolutions, it is able to reduce
random IO accesses, given (roughly) the same amount of sequential IO operations,
by factors reaching 100X and more.




 Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46

                                  E-Mail: info@ocularsystems.in
For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or
                                     Call Us On 7385665306




Existing System:


However, existing methods to prune irrelevant clusters are based on bounding
hyperspheres and/or bounding rectangles, whose lack of tightness compromises
their efficiency in exact nearest neighbor search.


Spatial queries, specifically nearest neighbor queries, in high-dimensional spaces
have been studied extensively. While several analyses have concluded that the
nearest neighbor search, with Euclidean distance metric, is impractical at high
dimensions due to the notorious “curse of dimensionality”, others have suggested
that this may be over pessimistic. Specifically, the authors of               have shown that
what Determines the search performance (at least for R-tree-like structures) is the
intrinsic dimensionality of the data set and not the dimensionality of the address
space (or the embedding dimensionality).


We extend our distance bounding technique to the Mahalanobis distance metric,
and note large gains over existing indexes.


Proposed System:


We propose a new cluster-adaptive distance bound based on separating hyperplane
boundaries of Voronoi clusters to complement our cluster based index. This bound
enables efficient spatial filtering, with a relatively small pre-processing storage
overhead and is applicable to Euclidean and Mahalanobis similarity measures.
Experiments in exact nearest-neighbor set retrieval, conducted on real data-sets,
show   that   our    indexing    method     is   scalable   with   data-set    size   and    data
dimensionality and outperforms several recently proposed indexes.




 Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46

                                  E-Mail: info@ocularsystems.in
For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or
                                     Call Us On 7385665306


we outline our approach to indexing real high-dimensional data-sets. We focus on
the clustering paradigm for search and retrieval. The data-set is clustered, so that
clusters can be retrieved in decreasing order of their probability of containing
entries relevant to the query.




We note that the Vector Approximation (VA)-file technique implicitly assumes
independence across dimensions, and that each component is uniformly distributed.
This is an unrealistic assumption for real data-sets that typically exhibit significant
correlations   across    dimensions      and   non-uniform        distributions.   To   approach
optimality, an indexing technique must take these properties into account. We
resort to a Voronoi clustering framework as it can naturally exploit correlations
across dimensions (in fact, such clustering algorithms are the method of choice in
the design of vector quantizers). Moreover, we show how our clustering procedure
can be combined with any other generic clustering method of choice (such as
BIRCH ) requiring only one additional scan of the data-set. Lastly, we note that the
sequential scan is in fact a special case of clustering based index i.e. with only one
cluster.
Several index structures exist that facilitate search and retrieval of multi-
dimensional data. In low dimensional spaces, recursive partitioning of the space
with hyper-rectangles       hyper-spheres        or a combination of hyper-spheres and
hyper-rectangles have been found to be effective for nearest neighbor search and
retrieval. While the preceding methods specialize to Euclidean distance (l2 norm),
M-trees have been found to be effective for metric spaces with arbitrary distance
functions (which are metrics).


Such multi-dimensional indexes work well in low dimensional spaces, where they
outperform sequential scan. But it has been observed that the performance
degrades with increase in feature dimensions and, after a certain dimension
threshold, becomes inferior to sequential scan. In a celebrated result,



 Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46

                                  E-Mail: info@ocularsystems.in
For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or
                                     Call Us On 7385665306


Weber et. Al have shown that whenever the dimensionality is above 10, these
methods     are   outperformed      by    simple    sequential    scan.    Such    performance
degradation is attributed to Bellman’s ‘curse of dimensionality’, which refers to the
exponential growth of hyper-volume with dimensionality of the space.




Module Description:


   1. A New Cluster Distance Bound
   2. Adaptability to Weighted Euclidean or Mahalanobis Distances
   3. An Efficient Search Index
   4. Vector Approximation Files
   5. Approximate Similarity Search


A New Cluster Distance Bound


Crucial to the effectiveness of the clustering-based search strategy is efficient
bounding of query-cluster distances. This is the mechanism that allows the
elimination of irrelevant clusters. Traditionally, this has been performed with
bounding spheres and rectangles. However, hyperspheres and hyperrectangles are
generally not optimal bounding surfaces for clusters in high dimensional spaces. In
fact, this is a phenomenon observed in the SR-tree, where the authors have used a
combination spheres and rectangles, to outperform indexes using only bounding
spheres (like the SS-tree) or bounding rectangles (R∗-tree).


The premise herein is that, at high dimensions, considerable improvement in
efficiency can be achieved by relaxing restrictions on the regularity of bounding
surfaces (i.e., spheres or rectangles). Specifically, by creating Voronoi clusters,
withpiecewise-linear boundaries, we allow for more general convex polygon



 Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46

                                  E-Mail: info@ocularsystems.in
For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or
                                     Call Us On 7385665306


structures that are able to efficiently bound the cluster surface. With the
construction of Voronoi clusters under the Euclidean distance measure, this is
possible. By projection onto these hyperplane boundaries and complementing with
the cluster-hyperplane distance, we develop an appropriate lower bound on the
distance of a query to a cluster.




Adaptability to Weighted Euclidean or Mahalanobis Distances


While the Euclidean distance metric is popular within the multimedia indexing
community it is by no means the “correct” distance measure, in that it may be a
poor approximation of user perceived similarities. The Mahalanobis distance
measure has more degrees of freedom than the Euclidean distance and by proper
updation (or relevance feedback), has been found to be a much better estimator of
user perceptions and more recently) . We extend our distance bounding technique
to the Mahalanobis distance metric, and note large gains over existing indexes.


An Efficient Search Index


The data set is partitioned into multiple Voronoi clusters and for any kNN query, the
clusters are ranked in order of the hyperplane bounds and in this way, the
irrelevant clusters are filtered out. We note that the sequential scan is a special
case of our indexing, if there were only one cluster. An important feature of our
search index is that we do not store the hyperplane boundaries (which form the
faces of the bounding polygons), but rather generate them dynamically, from the
cluster centroids. The only storage apart from the centroids are the cluster-
hyperplane boundary distances (or the smallest cluster-hyperplane distance). Since
our bound is relatively tight, our search algorithm is effective in spatial filtering of



 Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46

                                  E-Mail: info@ocularsystems.in
For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or
                                     Call Us On 7385665306


irrelevant clusters, resulting in significant performance gains. We expand on the
results and techniques initially presented in , with comparison against several
recently proposed indexing techniques.


Vector Approximation Files


A popular and effective technique to overcome the curse of dimensionality is the
vector approximation file (VA-File). VA-File partitions the space into hyper-
rectangular cells, to obtain a quantized approximation for the data that reside inside
the cells. Non-empty cell locations are encoded into bit strings and stored in a
separate approximation file, on the hard-disk. During a nearest neighbor search,
the vector approximation file is sequentially scanned and upper and lower bounds
on the distance from the query vector to each cell are estimated. The bounds are
used to prune irrelevant cells. The final set of candidate vectors are then read from
the hard disk and the exact nearest neighbors are determined. At this point, we
note that the terminology “Vector Approximation” is somewhat confusing, since
what is actually being performed is scalar quantization, where each component of
the feature vectors separately and uniformly quantized (in contradistinction with
vector quantization in the signal compression literature).


VA-File was followed by several more recent techniques to overcome the curse of
dimensionality. In the VA+-File, the data-set is rotated into a set of uncorrelated
dimensions, with more approximation bits being provided for dimensions with
higher variance. The approximation cells are adaptively spaced according to the
data distribution. Methods such as LDR and the recently proposed non-linear
approximations aim to outperform sequential scan by a combination of clustering
and dimensionality reduction. There also exist a few hybrid methods, such as the A-
Tree, and IQ-Tree, which combine VA-style approximations within a tree based
index.
Approximate Similarity Search



 Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46

                                  E-Mail: info@ocularsystems.in
For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or
                                     Call Us On 7385665306




Lastly, it has been argued that the feature vectors and distance functions are often
only approximations of user perception of similarity. Hence, even the results of an
exact similarity search is inevitably perceptually approximate, with additional
rounds of query refinement necessary. Conversely, by performing an approximate
search, for a small penalty in

accuracy, considerable savings in query processing time would be possible.
Examples of such search strategies are MMDR probabilistic searches and locality
sensitive hashing .The reader is directed to for a more detailed survey of
approximate similarity search. The limits of approximate indexing i.e. the optimal
tradeoffs between search quality and search time has also been studied within an
information theoretic framework.

System Architecture:




 Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46

                                  E-Mail: info@ocularsystems.in
For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or
                                     Call Us On 7385665306


Hardware System Requirement


     Processor                -   Pentium –III

     Speed                             -       1.1 Ghz

      RAM                              -       256 MB(min)

      Hard Disk                        -       20 GB

      Floppy Drive                    -        1.44 MB

      Key Board                        -       Standard Windows Keyboard

      Mouse                                -    Two or Three Button Mouse

      Monitor                              -    SVGA




S/W System Requirement



             Operating System                  :   Windows 95/98/2000/NT4.0.

             Application Server                :   Tomcat6.0



             Front End                         :   HTML, Java.

             Scripts                           :   JavaScript.

             Server side Script                :   Java Server Pages.

             Database                          :   Mysql.

             Database Connectivity :               JDBC.




 Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46

                                  E-Mail: info@ocularsystems.in

Contenu connexe

Tendances

Assessing the compactness and isolation of individual clusters
Assessing the compactness and isolation of individual clustersAssessing the compactness and isolation of individual clusters
Assessing the compactness and isolation of individual clusters
perfj
 
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENT
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENTINCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENT
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENT
IJCSEA Journal
 
Matlab adaptive image search with hash codes
Matlab  adaptive image search with hash codesMatlab  adaptive image search with hash codes
Matlab adaptive image search with hash codes
Ecway Technologies
 
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
ijsrd.com
 
Anomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisAnomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component Analysis
IOSR Journals
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
IJERD Editor
 

Tendances (19)

IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016 IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
 
Assessing the compactness and isolation of individual clusters
Assessing the compactness and isolation of individual clustersAssessing the compactness and isolation of individual clusters
Assessing the compactness and isolation of individual clusters
 
A0360109
A0360109A0360109
A0360109
 
Fast and Scalable Semi Supervised Adaptation For Video Action Recognition
Fast and Scalable Semi Supervised Adaptation For Video Action RecognitionFast and Scalable Semi Supervised Adaptation For Video Action Recognition
Fast and Scalable Semi Supervised Adaptation For Video Action Recognition
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
A0310112
A0310112A0310112
A0310112
 
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENT
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENTINCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENT
INCREMENTAL SEMI-SUPERVISED CLUSTERING METHOD USING NEIGHBOURHOOD ASSIGNMENT
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
AN EFFICIENT DEPLOYMENT APPROACH FOR IMPROVED COVERAGE IN WIRELESS SENSOR NET...
AN EFFICIENT DEPLOYMENT APPROACH FOR IMPROVED COVERAGE IN WIRELESS SENSOR NET...AN EFFICIENT DEPLOYMENT APPROACH FOR IMPROVED COVERAGE IN WIRELESS SENSOR NET...
AN EFFICIENT DEPLOYMENT APPROACH FOR IMPROVED COVERAGE IN WIRELESS SENSOR NET...
 
Dynamic Trust Management of Unattended Wireless Sensor Networks for Cost Awar...
Dynamic Trust Management of Unattended Wireless Sensor Networks for Cost Awar...Dynamic Trust Management of Unattended Wireless Sensor Networks for Cost Awar...
Dynamic Trust Management of Unattended Wireless Sensor Networks for Cost Awar...
 
Implementation of Fuzzy Logic for the High-Resolution Remote Sensing Images w...
Implementation of Fuzzy Logic for the High-Resolution Remote Sensing Images w...Implementation of Fuzzy Logic for the High-Resolution Remote Sensing Images w...
Implementation of Fuzzy Logic for the High-Resolution Remote Sensing Images w...
 
Matlab adaptive image search with hash codes
Matlab  adaptive image search with hash codesMatlab  adaptive image search with hash codes
Matlab adaptive image search with hash codes
 
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
 
Anomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisAnomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component Analysis
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
IEEE MultiMedia 2016 Title and Abstract
IEEE MultiMedia 2016 Title and AbstractIEEE MultiMedia 2016 Title and Abstract
IEEE MultiMedia 2016 Title and Abstract
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 

Similaire à Adaptive cluster distance bounding

Autism_risk_factors
Autism_risk_factorsAutism_risk_factors
Autism_risk_factors
Colleen Chen
 

Similaire à Adaptive cluster distance bounding (20)

Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 
Spe165 t
Spe165 tSpe165 t
Spe165 t
 
Noise-robust classification with hypergraph neural network
Noise-robust classification with hypergraph neural networkNoise-robust classification with hypergraph neural network
Noise-robust classification with hypergraph neural network
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
IEEE PROJECT TOPICS &ABSTRACTS on image processing
IEEE PROJECT TOPICS &ABSTRACTS on image processingIEEE PROJECT TOPICS &ABSTRACTS on image processing
IEEE PROJECT TOPICS &ABSTRACTS on image processing
 
IEEE 2014 Java Projects
IEEE 2014 Java ProjectsIEEE 2014 Java Projects
IEEE 2014 Java Projects
 
IEEE 2014 Java Projects
IEEE 2014 Java ProjectsIEEE 2014 Java Projects
IEEE 2014 Java Projects
 
A fully distributed scheme for discovery of semantic relationships
A fully distributed scheme for discovery of semantic relationshipsA fully distributed scheme for discovery of semantic relationships
A fully distributed scheme for discovery of semantic relationships
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
 
RANK-BASED SIMILARITY SEARCH: REDUCING THE DIMENSIONAL DEPENDENCE
RANK-BASED SIMILARITY SEARCH: REDUCING THE DIMENSIONAL DEPENDENCERANK-BASED SIMILARITY SEARCH: REDUCING THE DIMENSIONAL DEPENDENCE
RANK-BASED SIMILARITY SEARCH: REDUCING THE DIMENSIONAL DEPENDENCE
 
Rank based similarity search reducing the dimensional dependence
Rank based similarity search reducing the dimensional dependenceRank based similarity search reducing the dimensional dependence
Rank based similarity search reducing the dimensional dependence
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Autism_risk_factors
Autism_risk_factorsAutism_risk_factors
Autism_risk_factors
 
Subspace discriminant approach_hyperspectral
Subspace discriminant approach_hyperspectralSubspace discriminant approach_hyperspectral
Subspace discriminant approach_hyperspectral
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text Documents
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 

Plus de Ocular Systems (6)

Vpidea 12
Vpidea 12Vpidea 12
Vpidea 12
 
Buffer sizing for 802.11 based networks
Buffer sizing for 802.11 based networksBuffer sizing for 802.11 based networks
Buffer sizing for 802.11 based networks
 
dotnet-applications-projects-BCA-BCS-Diploma
dotnet-applications-projects-BCA-BCS-Diplomadotnet-applications-projects-BCA-BCS-Diploma
dotnet-applications-projects-BCA-BCS-Diploma
 
Networking ieee-project-topics-ocularsystems.in
Networking ieee-project-topics-ocularsystems.in Networking ieee-project-topics-ocularsystems.in
Networking ieee-project-topics-ocularsystems.in
 
Image processing ieee-projects-ocularsystems.in-
Image processing ieee-projects-ocularsystems.in-Image processing ieee-projects-ocularsystems.in-
Image processing ieee-projects-ocularsystems.in-
 
Advanced java-applications-projects-ocular systems.in-
Advanced java-applications-projects-ocular systems.in-Advanced java-applications-projects-ocular systems.in-
Advanced java-applications-projects-ocular systems.in-
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Adaptive cluster distance bounding

  • 1. For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or Call Us On 7385665306 Adaptive Cluster Distance Bounding for High Dimensional Indexing+ Abstract: We consider approaches for similarity search in correlated, high-dimensional data sets, which are derived within a clustering framework. We note that indexing by “vector approximation” (VA-File), which was proposed as a technique to combat the “Curse of Dimensionality,” employs scalar quantization, and hence necessarily ignores dependencies across dimensions, which represents a source of sub optimality. Clustering, on the other hand, exploits inter dimensional correlations and is thus a more compact representation of the data set. However, existing methods to prune irrelevant clusters are based on bounding hyperspheres and/or bounding rectangles, whose lack of tightness compromises their efficiency in exact nearest neighbor search. We propose a new cluster-adaptive distance bound based on separating hyperplane boundaries of Voronoi clusters to complement our cluster based index. This bound enables efficient spatial filtering, with a relatively small preprocessing storage overhead and is applicable to euclidean and Mahalanobis similarity measures. Experiments in exact nearest-neighbor set retrieval, conducted on real data sets, show that our indexing method is scalable with data set size and data dimensionality and outperforms several recently proposed indexes. Relative to the VA-File, over a wide range of quantization resolutions, it is able to reduce random IO accesses, given (roughly) the same amount of sequential IO operations, by factors reaching 100X and more. Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46 E-Mail: info@ocularsystems.in
  • 2. For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or Call Us On 7385665306 Existing System: However, existing methods to prune irrelevant clusters are based on bounding hyperspheres and/or bounding rectangles, whose lack of tightness compromises their efficiency in exact nearest neighbor search. Spatial queries, specifically nearest neighbor queries, in high-dimensional spaces have been studied extensively. While several analyses have concluded that the nearest neighbor search, with Euclidean distance metric, is impractical at high dimensions due to the notorious “curse of dimensionality”, others have suggested that this may be over pessimistic. Specifically, the authors of have shown that what Determines the search performance (at least for R-tree-like structures) is the intrinsic dimensionality of the data set and not the dimensionality of the address space (or the embedding dimensionality). We extend our distance bounding technique to the Mahalanobis distance metric, and note large gains over existing indexes. Proposed System: We propose a new cluster-adaptive distance bound based on separating hyperplane boundaries of Voronoi clusters to complement our cluster based index. This bound enables efficient spatial filtering, with a relatively small pre-processing storage overhead and is applicable to Euclidean and Mahalanobis similarity measures. Experiments in exact nearest-neighbor set retrieval, conducted on real data-sets, show that our indexing method is scalable with data-set size and data dimensionality and outperforms several recently proposed indexes. Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46 E-Mail: info@ocularsystems.in
  • 3. For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or Call Us On 7385665306 we outline our approach to indexing real high-dimensional data-sets. We focus on the clustering paradigm for search and retrieval. The data-set is clustered, so that clusters can be retrieved in decreasing order of their probability of containing entries relevant to the query. We note that the Vector Approximation (VA)-file technique implicitly assumes independence across dimensions, and that each component is uniformly distributed. This is an unrealistic assumption for real data-sets that typically exhibit significant correlations across dimensions and non-uniform distributions. To approach optimality, an indexing technique must take these properties into account. We resort to a Voronoi clustering framework as it can naturally exploit correlations across dimensions (in fact, such clustering algorithms are the method of choice in the design of vector quantizers). Moreover, we show how our clustering procedure can be combined with any other generic clustering method of choice (such as BIRCH ) requiring only one additional scan of the data-set. Lastly, we note that the sequential scan is in fact a special case of clustering based index i.e. with only one cluster. Several index structures exist that facilitate search and retrieval of multi- dimensional data. In low dimensional spaces, recursive partitioning of the space with hyper-rectangles hyper-spheres or a combination of hyper-spheres and hyper-rectangles have been found to be effective for nearest neighbor search and retrieval. While the preceding methods specialize to Euclidean distance (l2 norm), M-trees have been found to be effective for metric spaces with arbitrary distance functions (which are metrics). Such multi-dimensional indexes work well in low dimensional spaces, where they outperform sequential scan. But it has been observed that the performance degrades with increase in feature dimensions and, after a certain dimension threshold, becomes inferior to sequential scan. In a celebrated result, Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46 E-Mail: info@ocularsystems.in
  • 4. For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or Call Us On 7385665306 Weber et. Al have shown that whenever the dimensionality is above 10, these methods are outperformed by simple sequential scan. Such performance degradation is attributed to Bellman’s ‘curse of dimensionality’, which refers to the exponential growth of hyper-volume with dimensionality of the space. Module Description: 1. A New Cluster Distance Bound 2. Adaptability to Weighted Euclidean or Mahalanobis Distances 3. An Efficient Search Index 4. Vector Approximation Files 5. Approximate Similarity Search A New Cluster Distance Bound Crucial to the effectiveness of the clustering-based search strategy is efficient bounding of query-cluster distances. This is the mechanism that allows the elimination of irrelevant clusters. Traditionally, this has been performed with bounding spheres and rectangles. However, hyperspheres and hyperrectangles are generally not optimal bounding surfaces for clusters in high dimensional spaces. In fact, this is a phenomenon observed in the SR-tree, where the authors have used a combination spheres and rectangles, to outperform indexes using only bounding spheres (like the SS-tree) or bounding rectangles (R∗-tree). The premise herein is that, at high dimensions, considerable improvement in efficiency can be achieved by relaxing restrictions on the regularity of bounding surfaces (i.e., spheres or rectangles). Specifically, by creating Voronoi clusters, withpiecewise-linear boundaries, we allow for more general convex polygon Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46 E-Mail: info@ocularsystems.in
  • 5. For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or Call Us On 7385665306 structures that are able to efficiently bound the cluster surface. With the construction of Voronoi clusters under the Euclidean distance measure, this is possible. By projection onto these hyperplane boundaries and complementing with the cluster-hyperplane distance, we develop an appropriate lower bound on the distance of a query to a cluster. Adaptability to Weighted Euclidean or Mahalanobis Distances While the Euclidean distance metric is popular within the multimedia indexing community it is by no means the “correct” distance measure, in that it may be a poor approximation of user perceived similarities. The Mahalanobis distance measure has more degrees of freedom than the Euclidean distance and by proper updation (or relevance feedback), has been found to be a much better estimator of user perceptions and more recently) . We extend our distance bounding technique to the Mahalanobis distance metric, and note large gains over existing indexes. An Efficient Search Index The data set is partitioned into multiple Voronoi clusters and for any kNN query, the clusters are ranked in order of the hyperplane bounds and in this way, the irrelevant clusters are filtered out. We note that the sequential scan is a special case of our indexing, if there were only one cluster. An important feature of our search index is that we do not store the hyperplane boundaries (which form the faces of the bounding polygons), but rather generate them dynamically, from the cluster centroids. The only storage apart from the centroids are the cluster- hyperplane boundary distances (or the smallest cluster-hyperplane distance). Since our bound is relatively tight, our search algorithm is effective in spatial filtering of Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46 E-Mail: info@ocularsystems.in
  • 6. For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or Call Us On 7385665306 irrelevant clusters, resulting in significant performance gains. We expand on the results and techniques initially presented in , with comparison against several recently proposed indexing techniques. Vector Approximation Files A popular and effective technique to overcome the curse of dimensionality is the vector approximation file (VA-File). VA-File partitions the space into hyper- rectangular cells, to obtain a quantized approximation for the data that reside inside the cells. Non-empty cell locations are encoded into bit strings and stored in a separate approximation file, on the hard-disk. During a nearest neighbor search, the vector approximation file is sequentially scanned and upper and lower bounds on the distance from the query vector to each cell are estimated. The bounds are used to prune irrelevant cells. The final set of candidate vectors are then read from the hard disk and the exact nearest neighbors are determined. At this point, we note that the terminology “Vector Approximation” is somewhat confusing, since what is actually being performed is scalar quantization, where each component of the feature vectors separately and uniformly quantized (in contradistinction with vector quantization in the signal compression literature). VA-File was followed by several more recent techniques to overcome the curse of dimensionality. In the VA+-File, the data-set is rotated into a set of uncorrelated dimensions, with more approximation bits being provided for dimensions with higher variance. The approximation cells are adaptively spaced according to the data distribution. Methods such as LDR and the recently proposed non-linear approximations aim to outperform sequential scan by a combination of clustering and dimensionality reduction. There also exist a few hybrid methods, such as the A- Tree, and IQ-Tree, which combine VA-style approximations within a tree based index. Approximate Similarity Search Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46 E-Mail: info@ocularsystems.in
  • 7. For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or Call Us On 7385665306 Lastly, it has been argued that the feature vectors and distance functions are often only approximations of user perception of similarity. Hence, even the results of an exact similarity search is inevitably perceptually approximate, with additional rounds of query refinement necessary. Conversely, by performing an approximate search, for a small penalty in accuracy, considerable savings in query processing time would be possible. Examples of such search strategies are MMDR probabilistic searches and locality sensitive hashing .The reader is directed to for a more detailed survey of approximate similarity search. The limits of approximate indexing i.e. the optimal tradeoffs between search quality and search time has also been studied within an information theoretic framework. System Architecture: Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46 E-Mail: info@ocularsystems.in
  • 8. For Diploma, BE, ME, M Tech, BCA, MCA, PHD Project Guidance, Please Visit: www.ocularsystems.in Or Call Us On 7385665306 Hardware System Requirement Processor - Pentium –III Speed - 1.1 Ghz RAM - 256 MB(min) Hard Disk - 20 GB Floppy Drive - 1.44 MB Key Board - Standard Windows Keyboard Mouse - Two or Three Button Mouse Monitor - SVGA S/W System Requirement  Operating System : Windows 95/98/2000/NT4.0.  Application Server : Tomcat6.0  Front End : HTML, Java.  Scripts : JavaScript.  Server side Script : Java Server Pages.  Database : Mysql.  Database Connectivity : JDBC. Ocular Systems, Shop No:1, Swagat Corner Building, Near Narayani Dham Temple, Katraj, Pune-46 E-Mail: info@ocularsystems.in