SlideShare une entreprise Scribd logo
1  sur  57
Multidimensional Indexes

        Chapter 5
Outline
•   Applications
•   Hash-like structures
•   Tree-like structures
•   Bit-map indexes
Search Key
• Search key is (F1, F2, ..Fk)
  – Separated by special markers.
  – Example
     • If F1=abcd and F2=123, the search key is
       “abcd#123”
Applications: GIS
• Geographic Information Systems
   – Objects are in two dimensional space
   – The objects may be points or shapes
   – Objects may be houses, roads, bridges, pipelines and many
     other physical objects.
• Query types
   – Partial match queries
      • Specify the values for one or more dimensions and look for all
        points matching those values in those dimensions.
   – Range queries
      • Set of shapes within the range
   – Nearest neighbor queries
      • Closest point to a given point
   – Where-am-I queries
      • When you click a mouse, the system determines which of the
        displayed elements you were clicking.
Data Cubes
• Data exists in high dimensional space
• A chain store may record each sale made, including
   –   The day and time
   –   The store in which the sale was made
   –   The item purchased
   –   The color of the item
   –   The size of the item
• Attributes are seen as dimensions multidimensional
  space, data cube.
• Typical query
   – Give a class of pink shirts for each store and each month of
     1998.
Multidimensional queries in SQL
• Represent points as a relation Points(x,y)
• Query
  – Find the nearest point to (10, 20)
  – Compute the distance between (10,20) to
    every other point.
Rectangles
• Rectangles(id, x11, y11, xur, yur)
• If the query is
  – Find the rectangles enclosing the point
    (10,20)
• SELECT id
  FROM Rectangles
  WHERE x11 <= 10.0 AND y11 <= 20.0
  AND xur >=10.0 AND yur >=20.0;
Data Cube
• Fact table
  – Sales(day,store, item. color, size)
• Query
  – Summarize the sale of pink shirts by day and store
• SELECT day, store, COUNT(*) AS totalSales
  FROM Sales
  WHERE item=‘shirt’ AND color=‘pink’
  GROUP BY day, store;
Executing range queries in
        Conventional Indexes
• Motivation: Find records where
               DEPT = “Toy” AND SAL > 50k
• Strategy 1: Use “Toy” index and check salary
• Strategy II: Use “Toy” index and SAL index and
  intersect.
   – Complexity
      • Toy index: number of disk accesses = number of disk blocks
      • SAL index: number of disk accesses= number of records
• So conventional indexes of little help.
Executing nearest-neighbor queries using
              conventional indexes
• There might be no point in the selected
  range
  – Repeat the search by increasing the range
• The closest point within the range might
  not be the closest point overall.
  – There is one point closer from outside range.
Other Limitations of Conventional
                Indexes
• We can only keep the file sorted on only
  attribute.
• If the query is on multiple dimensions
  – We will end-up having one disk access for
    each record
• It becomes too expensive
Overview of Multidimensional Index
            Structures
• Hash-table-like approaches
• Tree-like approaches
• In both cases, we give-up some properties of
  single dimensional indexes
• Hash
  – We have to access multiple buckets
• Tree
  – Tree may not be balanced
  – There may not exist a correspondence between tree
    nodes and disk blocks
  – Information in the disk block is much smaller.
Hash like structures: GRID Files
• Each dimension, grid lines partition the
  space into stripes,
  – Points that fall on a grid line will be
    considered to belong to the stripe for which
    that grid line is lower boundary
  – The number of grid lines in each dimension
    may vary.
  – Space between grid lines in the same
    dimension may also vary
Grid Index
                           Key 2
                  X1 X2      ……            Xn
         V1
         V2
Key 1

             Vn

                    To records with key1=V3, key2=X2


                                                 14
CLAIM

• Can quickly find records with
  – key 1 = Vi ∧ Key 2 = Xj
  – key 1 = Vi
  – key 2 = Xj




                                  15
CLAIM

• Can quickly find records with
  – key 1 = Vi ∧ Key 2 = Xj
  – key 1 = Vi
  – key 2 = Xj

• And also ranges….
  – E.g., key 1 ≥ Vi ∧ key 2 < Xj


                                    16
But there is a catch with Grid Indexes!

  • How is Grid Index stored on disk?

           V1        V2        V3
Like
Array...
           X1
           X2
           X3
           X4

                   X1
                   X2
                   X3
                   X4

                            X1
                            X2
                            X3
                            X4
                                            17
But there is a catch with Grid Indexes!

  • How is Grid Index stored on disk?

           V1        V2        V3
Like
Array...
           X1
           X2
           X3
           X4

                   X1
                   X2
                   X3
                   X4

                            X1
                            X2
                            X3
                            X4
Problem:
• Need regularity so we can compute
     position of Vi,Xj entry
                                            18
Solution: Use Indirection

      X1 X2 X3              Buckets
                       --
 V1                    --
                       --
 V2                    --
                       --
                       --
 V3                             *Grid only
                       --
 V4                    --
                       --       contains
                                pointers to
                                buckets
             --
             --   --
                  --
 Buckets     --   --


                                              19
With indirection:

• Grid can be regular without wasting space
• We do have price of indirection




                                         20
Can also index grid on value ranges

Salary                    Grid

     0-20K         1
    20K-50K        2
      50K-         3
               8




                        1    2       3
Linear Scale           Toy Sales Personnel


                                         21
Grid files

 Good for multiple-key search
+
 Space, management overhead
 -   (nothing is free)
  Need partitioning ranges that evenly
 -  split keys




                                         22
Partitioned Hash Functions
Partitioned hash function

Idea:
         010110 1110010

Key1                         Key2
            h1     h2




                                    24
EX:
h1(toy)     =0        000
h1(sales)   =1        001
h1(art)     =1        010
  .                   011
  .
h2(10k)     =01       100
h2(20k)     =11       101
h2(30k)     =01       110
h2(40k)     =00       111
  .
  .

            Fred,toy,10k,Joe,sales,10k
 Insert     Sally,art,30k
                                             25
EX:
h1(toy)     =0        000
h1(sales)   =1        001       Fred
h1(art)     =1        010
  .                   011
  .
h2(10k)     =01       100
h2(20k)     =11       101    JoeSally
h2(30k)     =01       110
h2(40k)     =00       111
  .
  .

            Fred,toy,10k,Joe,sales,10k
 Insert     Sally,art,30k
                                             26
h1(toy)   =0          000      Fred
h1(sales) =1          001    JoeJan
h1(art)   =1          010      Mary
  .                   011
  .
h2(10k)   =01         100      Sally
h2(20k)   =11         101
h2(30k)   =01         110 TomBill
h2(40k)   =00         111      Andy
  .
  .
• Find Emp. with Dept. = Sales ∧ Sal=40k

                                           27
h1(toy)   =0          000      Fred
h1(sales) =1          001    JoeJan
h1(art)   =1          010      Mary
  .                   011
  .
h2(10k)   =01         100      Sally
h2(20k)   =11         101
h2(30k)   =01         110 TomBill
h2(40k)   =00         111      Andy
  .
  .
• Find Emp. with Dept. = Sales ∧ Sal=40k

                                           28
h1(toy)   =0          000     Fred
h1(sales) =1          001   JoeJan
h1(art)   =1          010     Mary
  .                   011
  .
h2(10k)   =01         100     Sally
h2(20k)   =11         101
h2(30k)   =01         110   TomBill
h2(40k)   =00         111     Andy
  .
  .
• Find Emp. with Sal=30k

                                          29
h1(toy)   =0          000     Fred
h1(sales) =1          001   JoeJan
h1(art)   =1          010     Mary
  .                   011
  .
h2(10k)   =01         100     Sally
h2(20k)   =11         101
h2(30k)   =01         110   TomBill
h2(40k)   =00         111     Andy
  .
  .
• Find Emp. with Sal=30k            look here

                                            30
h1(toy)   =0          000      Fred
h1(sales) =1          001    JoeJan
h1(art)   =1          010      Mary
  .                   011
  .
h2(10k)   =01         100      Sally
h2(20k)   =11         101
h2(30k)   =01         110 TomBill
h2(40k)   =00         111      Andy
  .
  .
• Find Emp. with Dept. = Sales

                                          31
h1(toy)   =0          000      Fred
h1(sales) =1          001    JoeJan
h1(art)   =1          010      Mary
  .                   011
  .
h2(10k)   =01         100      Sally
h2(20k)   =11         101
h2(30k)   =01         110 TomBill
h2(40k)   =00         111      Andy
  .
  .
• Find Emp. with Dept. = Sales    look here


                                              32
Tree-like Structures
•   Multiple-key indexes
•   Kd-trees
•   Quad trees
•   R-trees
Multiple-key indexes
• Several attributes representing
  dimensions of data points
• Multiple key index is an index of indexes in
  which the nodes at each level are indexes
  for one attribute.
Strategy:

• Multiple Key Index
One idea:
                       I2


               I1      I3




                            35
Example
            10k
            15k    Example
  Art       17k    Record
 Sales      21k
  Toy

Dept        12k      Name=Joe
Index       15k      DEPT=Sales
            15k      SAL=15k
            19k
          Salary
          Index

                              36
For which queries is this index good?


  Find RECs Dept = “Sales”   SAL=20k
  Find RECs Dept = “Sales”   SAL  20k
  Find RECs Dept = “Sales”
  Find RECs SAL = 20k




                                        37
KD-trees
• K dimensional tree is generalizing binary
  search tree into multi-dimensional data.
• A KD tree is a binary tree in which interior
  nodes have an associated attribute “a”
  and a value “v” that splits data into two
  parts.
• The attributes at different levels of a tree
  are different, and levels rotating among
  the attributes of all dimensions.
Interesting application:


 • Geographic Data
     y
                           DATA:
                           X1,Y1, Attributes
                 x         X2,Y2, Attributes


                             ...
                                            39
Queries:

• What city is at Xi,Yi?
• What is within 5 miles from Xi,Yi?
• Which is closest point to Xi,Yi?




                                         40
Example           i
                                      a
                          e   d
              h
                                  b
                  n       f
          l           o           c
          j               g
          k       m




                                      41
Example                     i
                                                  a
                                     e   d
                        h
                                              b
          10   20           n        f
                    l           o             c
                    j                g
                    k       m

                                10       20




                                                  42
Example                                                        a
                            40           i        e   d
                                     h
                            30                             b
          10      20                     n        f
                            20   l           o             c
                            10   j                g
    25    15 35        20                m
                                 k

                                             10       20




                                                               43
Example                                                                  a
                                      40           i        e   d
                                               h
                                      30                             b
                    10      20                     n        f
                                      20   l           o             c
                                      10   j                g
          25        15 35        20                m
                                           k

                                                       10       20
 5


     15        15



                                                                         44
Example                                                                         a
                                            40            i        e   d
                                                      h
                                            30                              b
                         10      20                       n        f
                                            20    l           o             c
                                            10    j                g
          25             15 35         20                 m
                                                  k

                                                              10       20
 5        h i        g    f      d e    c   a b


     15         15


j k       l      m        n o
                                                                                45
Example                                                                         a
                                             40           i        e   d
                                                      h
                                             30                             b
                         10      20                       n        f
                                             20   l           o             c
                                             10    j               g
          25             15 35         20                 m
                                                   k

                                                              10       20
 5        h i        g    f      d e    c   a b


     15         15                          • Search points near f
                                            • Search points near b
j k       l      m        n o
                                                                                46
Queries

•   Find points with Yi  20
•   Find points with Xi  5
•   Find points “close” to i = 12,38
•   Find points “close” to b = 7,24




                                         47
Quad trees
• Divides multidimensional space into
  quadrants and divides the quadrants
  same way if they have too many points.
• If the number of points in a square
  – fits in a block, it is a leaf node
  – no longer fits in a block, it becomes an interior
    node, four quadrants are its children.
Quad tree pictures
R-tree
• Captures the spirit of B-tree for multidimensional
  data.
• Represents a collection of regions by grouping
  them into a hierarchy of larger regions.
• Data is divided into regions.
• Interior node is corresponds to interior region
  – region can be of any shape
     • Rectangular is popular
  – Children corresponds to sub-regions.
R-tree
Bitmap indexes
• Assume that records have permanent numbers
• A bit-map index is a collection of bit vectors of length n,
  one for each value may appear in the field F.
• The vector for value v has 1 in position i if the i’th record
  has v in field F, and it has 0 there if not.
• Example for F and G fields:
   – (30, foo), (30,bar), (40, baz), (50, foo), (40, bar), (30, baz)
   – Bit index for F, each of 6 bits. For 30, it is 110001, for 40, it is
     001010, and for 50, it is 000100.
   – Bit index for G also have three vectors. For foo it is, 100100, for
     bar it is 010010, and for baz it is 001001.
Bit map indexes: Partial match
• Bit maps allow answering of partial match
  queries quickly and efficiently.
• Example:
   –   Movie(title, year, length, studioName)
   –   SELECT title
   –   FROM Movie
   –   WHERE studioName= ‘Disney’ and Year=1965;
• If we have bitmap for studioName and year, then
  intersection or AND operation will give the result.
• Bit vectors do not occupy much space.
Bitmap indexes: range queries
•   Example: consider the gold jewelry data of twelve points
    – 1 (25,60), 2(45,60), 3(50,75), 4(50,100), 5(50,120), 6(70, 110),
      7(85,140), 8(30,260), 9(25,400), 10(45,350), 11(50,275),
      12(60,260)
• Age has seven different values
    – 25(100000001000), 30(000000010000), 45(010000000100),
      50(001110000010), 60(000000000001), 70(000001000000)
    – 85(000000100000)
• Salary has 10 different values
    –   60(000000000000), 75(001000000000), 100(000100000000)
    –   110(000001000000), 120(000010000000), 140(000000100000)
    –   260(000000010001), 275(000000000010), 350(000000000100)
    –   400(000000001000),
Example continued
• Find the jewelry buyers with an age range 45-55 and
  salary in the range 100-200
• Find the bit vectors of for the age values in the range
  and take OR
   – 010000000100 (for 45) and 001110000010 (for 50)
   – Result: 011110000110
• Find the bit vectors of salaries between 100 and 200
  thousand.
   – There are four: 100,110,120, and 140, their bitwaise OR is
     000111100000
• Take AND of both bit vectors
   – 000110000000
   – Find two records (50,100) and (50,120) are in the range.
Compressed bitmaps
• If number of different values is large, then
  number of 1 is rare.
• Run-length coding is used
   – Sequence of 0’s followed by 1.
   – Example: 000101 is two runs, 3 and 1. the binary
     representation is 11 and 1. So it is decoded as 111.
• To save space, the bitmap indexes tend to
  consist of vectors with very few 1’s are
  compressed using run-length coding.
Finding bit vectors
• Use any index technique to find the
  values.
• From the values to bit vectors.
• B-tree is a good choice.

Contenu connexe

Tendances

Web mining slides
Web mining slidesWeb mining slides
Web mining slidesmahavir_a
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 
Information Retrieval Techniques of Google
Information Retrieval Techniques of Google Information Retrieval Techniques of Google
Information Retrieval Techniques of Google Cyr Ish
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
History of Data Science
History of Data ScienceHistory of Data Science
History of Data ScienceDaniel Caesar
 
Data Science With Python
Data Science With PythonData Science With Python
Data Science With PythonMosky Liu
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & StorageIlayaraja P
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsRohithND
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Poster - Recommender system
Poster - Recommender systemPoster - Recommender system
Poster - Recommender systemDivya Jain
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
Natural language processing
Natural language processingNatural language processing
Natural language processingprashantdahake
 

Tendances (20)

Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
Hashing PPT
Hashing PPTHashing PPT
Hashing PPT
 
Information Retrieval Techniques of Google
Information Retrieval Techniques of Google Information Retrieval Techniques of Google
Information Retrieval Techniques of Google
 
Bit manipulation
Bit manipulationBit manipulation
Bit manipulation
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
History of Data Science
History of Data ScienceHistory of Data Science
History of Data Science
 
Data Science With Python
Data Science With PythonData Science With Python
Data Science With Python
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Data mining
Data miningData mining
Data mining
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & Storage
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Poster - Recommender system
Poster - Recommender systemPoster - Recommender system
Poster - Recommender system
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

En vedette

Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...unyil96
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
 
Clustering &amp; classification
Clustering &amp; classificationClustering &amp; classification
Clustering &amp; classificationJamshed Khan
 
Multidimensional Data in the VO
Multidimensional Data in the VOMultidimensional Data in the VO
Multidimensional Data in the VOJose Enrique Ruiz
 
Agile Testing - LAST Conference 2015
Agile Testing - LAST Conference 2015Agile Testing - LAST Conference 2015
Agile Testing - LAST Conference 2015Theresa Neate
 
A survey on massively Parallelism for indexing multidimensional datasets on t...
A survey on massively Parallelism for indexing multidimensional datasets on t...A survey on massively Parallelism for indexing multidimensional datasets on t...
A survey on massively Parallelism for indexing multidimensional datasets on t...Tejovat Technologies Pvt.Ltd.,Wakad
 
Project - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingProject - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingGabriele Angeletti
 
Multidimensional Analysis of Complex Networks
Multidimensional Analysis of Complex NetworksMultidimensional Analysis of Complex Networks
Multidimensional Analysis of Complex NetworksLino Possamai
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)Pratik Meshram
 
Visualising Multi Dimensional Data
Visualising Multi Dimensional DataVisualising Multi Dimensional Data
Visualising Multi Dimensional DataAmit Kapoor
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnSarah Guido
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMSkoolkampus
 
Data compression for Large Multidimensional Data Warehouses
Data compression for Large Multidimensional Data WarehousesData compression for Large Multidimensional Data Warehouses
Data compression for Large Multidimensional Data WarehousesMahmud M
 

En vedette (13)

Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Clustering &amp; classification
Clustering &amp; classificationClustering &amp; classification
Clustering &amp; classification
 
Multidimensional Data in the VO
Multidimensional Data in the VOMultidimensional Data in the VO
Multidimensional Data in the VO
 
Agile Testing - LAST Conference 2015
Agile Testing - LAST Conference 2015Agile Testing - LAST Conference 2015
Agile Testing - LAST Conference 2015
 
A survey on massively Parallelism for indexing multidimensional datasets on t...
A survey on massively Parallelism for indexing multidimensional datasets on t...A survey on massively Parallelism for indexing multidimensional datasets on t...
A survey on massively Parallelism for indexing multidimensional datasets on t...
 
Project - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingProject - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive Hashing
 
Multidimensional Analysis of Complex Networks
Multidimensional Analysis of Complex NetworksMultidimensional Analysis of Complex Networks
Multidimensional Analysis of Complex Networks
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
Visualising Multi Dimensional Data
Visualising Multi Dimensional DataVisualising Multi Dimensional Data
Visualising Multi Dimensional Data
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-Learn
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
 
Data compression for Large Multidimensional Data Warehouses
Data compression for Large Multidimensional Data WarehousesData compression for Large Multidimensional Data Warehouses
Data compression for Large Multidimensional Data Warehouses
 

Similaire à Multidimensional Indexing

Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & howlucenerevolution
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesJonathan Katz
 
Graphing Exponentials
Graphing ExponentialsGraphing Exponentials
Graphing Exponentialsteachingfools
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Leonid Zhukov
 
Geometric transformation cg
Geometric transformation cgGeometric transformation cg
Geometric transformation cgharinipriya1994
 
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...Mozaic Works
 
There are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docxThere are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docxrelaine1
 
04-logic-gates (1).ppt
04-logic-gates (1).ppt04-logic-gates (1).ppt
04-logic-gates (1).pptBikashPaul14
 
Jacob's and Vlad's D.E.V. Project - 2012
Jacob's and Vlad's D.E.V. Project - 2012Jacob's and Vlad's D.E.V. Project - 2012
Jacob's and Vlad's D.E.V. Project - 2012Jacob_Evenson
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Ted Dunning
 

Similaire à Multidimensional Indexing (17)

Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & how
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
Graphing Exponentials
Graphing ExponentialsGraphing Exponentials
Graphing Exponentials
 
Class10
Class10Class10
Class10
 
Binary codes
Binary codesBinary codes
Binary codes
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
 
Geometric transformation cg
Geometric transformation cgGeometric transformation cg
Geometric transformation cg
 
basic_gates.ppt
basic_gates.pptbasic_gates.ppt
basic_gates.ppt
 
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
 
Class10
Class10Class10
Class10
 
There are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docxThere are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docx
 
Booth Multiplier
Booth MultiplierBooth Multiplier
Booth Multiplier
 
04-logic-gates.ppt
04-logic-gates.ppt04-logic-gates.ppt
04-logic-gates.ppt
 
04-logic-gates (1).ppt
04-logic-gates (1).ppt04-logic-gates (1).ppt
04-logic-gates (1).ppt
 
Lecture.1
Lecture.1Lecture.1
Lecture.1
 
Jacob's and Vlad's D.E.V. Project - 2012
Jacob's and Vlad's D.E.V. Project - 2012Jacob's and Vlad's D.E.V. Project - 2012
Jacob's and Vlad's D.E.V. Project - 2012
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 

Plus de Digvijay Singh (14)

Week3 applications
Week3 applicationsWeek3 applications
Week3 applications
 
Week3.1
Week3.1Week3.1
Week3.1
 
Week2.1
Week2.1Week2.1
Week2.1
 
Week1.2 intro
Week1.2 introWeek1.2 intro
Week1.2 intro
 
Networks
NetworksNetworks
Networks
 
Uncertainty
UncertaintyUncertainty
Uncertainty
 
Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tbl
 
Ngrams smoothing
Ngrams smoothingNgrams smoothing
Ngrams smoothing
 
Query execution
Query executionQuery execution
Query execution
 
Query compiler
Query compilerQuery compiler
Query compiler
 
Machine learning
Machine learningMachine learning
Machine learning
 
Hmm viterbi
Hmm viterbiHmm viterbi
Hmm viterbi
 
3 fol examples v2
3 fol examples v23 fol examples v2
3 fol examples v2
 
Bayesnetwork
BayesnetworkBayesnetwork
Bayesnetwork
 

Dernier

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Dernier (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Multidimensional Indexing

  • 2. Outline • Applications • Hash-like structures • Tree-like structures • Bit-map indexes
  • 3. Search Key • Search key is (F1, F2, ..Fk) – Separated by special markers. – Example • If F1=abcd and F2=123, the search key is “abcd#123”
  • 4. Applications: GIS • Geographic Information Systems – Objects are in two dimensional space – The objects may be points or shapes – Objects may be houses, roads, bridges, pipelines and many other physical objects. • Query types – Partial match queries • Specify the values for one or more dimensions and look for all points matching those values in those dimensions. – Range queries • Set of shapes within the range – Nearest neighbor queries • Closest point to a given point – Where-am-I queries • When you click a mouse, the system determines which of the displayed elements you were clicking.
  • 5. Data Cubes • Data exists in high dimensional space • A chain store may record each sale made, including – The day and time – The store in which the sale was made – The item purchased – The color of the item – The size of the item • Attributes are seen as dimensions multidimensional space, data cube. • Typical query – Give a class of pink shirts for each store and each month of 1998.
  • 6. Multidimensional queries in SQL • Represent points as a relation Points(x,y) • Query – Find the nearest point to (10, 20) – Compute the distance between (10,20) to every other point.
  • 7. Rectangles • Rectangles(id, x11, y11, xur, yur) • If the query is – Find the rectangles enclosing the point (10,20) • SELECT id FROM Rectangles WHERE x11 <= 10.0 AND y11 <= 20.0 AND xur >=10.0 AND yur >=20.0;
  • 8. Data Cube • Fact table – Sales(day,store, item. color, size) • Query – Summarize the sale of pink shirts by day and store • SELECT day, store, COUNT(*) AS totalSales FROM Sales WHERE item=‘shirt’ AND color=‘pink’ GROUP BY day, store;
  • 9. Executing range queries in Conventional Indexes • Motivation: Find records where DEPT = “Toy” AND SAL > 50k • Strategy 1: Use “Toy” index and check salary • Strategy II: Use “Toy” index and SAL index and intersect. – Complexity • Toy index: number of disk accesses = number of disk blocks • SAL index: number of disk accesses= number of records • So conventional indexes of little help.
  • 10. Executing nearest-neighbor queries using conventional indexes • There might be no point in the selected range – Repeat the search by increasing the range • The closest point within the range might not be the closest point overall. – There is one point closer from outside range.
  • 11. Other Limitations of Conventional Indexes • We can only keep the file sorted on only attribute. • If the query is on multiple dimensions – We will end-up having one disk access for each record • It becomes too expensive
  • 12. Overview of Multidimensional Index Structures • Hash-table-like approaches • Tree-like approaches • In both cases, we give-up some properties of single dimensional indexes • Hash – We have to access multiple buckets • Tree – Tree may not be balanced – There may not exist a correspondence between tree nodes and disk blocks – Information in the disk block is much smaller.
  • 13. Hash like structures: GRID Files • Each dimension, grid lines partition the space into stripes, – Points that fall on a grid line will be considered to belong to the stripe for which that grid line is lower boundary – The number of grid lines in each dimension may vary. – Space between grid lines in the same dimension may also vary
  • 14. Grid Index Key 2 X1 X2 …… Xn V1 V2 Key 1 Vn To records with key1=V3, key2=X2 14
  • 15. CLAIM • Can quickly find records with – key 1 = Vi ∧ Key 2 = Xj – key 1 = Vi – key 2 = Xj 15
  • 16. CLAIM • Can quickly find records with – key 1 = Vi ∧ Key 2 = Xj – key 1 = Vi – key 2 = Xj • And also ranges…. – E.g., key 1 ≥ Vi ∧ key 2 < Xj 16
  • 17. But there is a catch with Grid Indexes! • How is Grid Index stored on disk? V1 V2 V3 Like Array... X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 17
  • 18. But there is a catch with Grid Indexes! • How is Grid Index stored on disk? V1 V2 V3 Like Array... X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 Problem: • Need regularity so we can compute position of Vi,Xj entry 18
  • 19. Solution: Use Indirection X1 X2 X3 Buckets -- V1 -- -- V2 -- -- -- V3 *Grid only -- V4 -- -- contains pointers to buckets -- -- -- -- Buckets -- -- 19
  • 20. With indirection: • Grid can be regular without wasting space • We do have price of indirection 20
  • 21. Can also index grid on value ranges Salary Grid 0-20K 1 20K-50K 2 50K- 3 8 1 2 3 Linear Scale Toy Sales Personnel 21
  • 22. Grid files Good for multiple-key search + Space, management overhead - (nothing is free) Need partitioning ranges that evenly - split keys 22
  • 24. Partitioned hash function Idea: 010110 1110010 Key1 Key2 h1 h2 24
  • 25. EX: h1(toy) =0 000 h1(sales) =1 001 h1(art) =1 010 . 011 . h2(10k) =01 100 h2(20k) =11 101 h2(30k) =01 110 h2(40k) =00 111 . . Fred,toy,10k,Joe,sales,10k Insert Sally,art,30k 25
  • 26. EX: h1(toy) =0 000 h1(sales) =1 001 Fred h1(art) =1 010 . 011 . h2(10k) =01 100 h2(20k) =11 101 JoeSally h2(30k) =01 110 h2(40k) =00 111 . . Fred,toy,10k,Joe,sales,10k Insert Sally,art,30k 26
  • 27. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Dept. = Sales ∧ Sal=40k 27
  • 28. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Dept. = Sales ∧ Sal=40k 28
  • 29. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Sal=30k 29
  • 30. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Sal=30k look here 30
  • 31. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Dept. = Sales 31
  • 32. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Dept. = Sales look here 32
  • 33. Tree-like Structures • Multiple-key indexes • Kd-trees • Quad trees • R-trees
  • 34. Multiple-key indexes • Several attributes representing dimensions of data points • Multiple key index is an index of indexes in which the nodes at each level are indexes for one attribute.
  • 35. Strategy: • Multiple Key Index One idea: I2 I1 I3 35
  • 36. Example 10k 15k Example Art 17k Record Sales 21k Toy Dept 12k Name=Joe Index 15k DEPT=Sales 15k SAL=15k 19k Salary Index 36
  • 37. For which queries is this index good? Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL 20k Find RECs Dept = “Sales” Find RECs SAL = 20k 37
  • 38. KD-trees • K dimensional tree is generalizing binary search tree into multi-dimensional data. • A KD tree is a binary tree in which interior nodes have an associated attribute “a” and a value “v” that splits data into two parts. • The attributes at different levels of a tree are different, and levels rotating among the attributes of all dimensions.
  • 39. Interesting application: • Geographic Data y DATA: X1,Y1, Attributes x X2,Y2, Attributes ... 39
  • 40. Queries: • What city is at Xi,Yi? • What is within 5 miles from Xi,Yi? • Which is closest point to Xi,Yi? 40
  • 41. Example i a e d h b n f l o c j g k m 41
  • 42. Example i a e d h b 10 20 n f l o c j g k m 10 20 42
  • 43. Example a 40 i e d h 30 b 10 20 n f 20 l o c 10 j g 25 15 35 20 m k 10 20 43
  • 44. Example a 40 i e d h 30 b 10 20 n f 20 l o c 10 j g 25 15 35 20 m k 10 20 5 15 15 44
  • 45. Example a 40 i e d h 30 b 10 20 n f 20 l o c 10 j g 25 15 35 20 m k 10 20 5 h i g f d e c a b 15 15 j k l m n o 45
  • 46. Example a 40 i e d h 30 b 10 20 n f 20 l o c 10 j g 25 15 35 20 m k 10 20 5 h i g f d e c a b 15 15 • Search points near f • Search points near b j k l m n o 46
  • 47. Queries • Find points with Yi 20 • Find points with Xi 5 • Find points “close” to i = 12,38 • Find points “close” to b = 7,24 47
  • 48. Quad trees • Divides multidimensional space into quadrants and divides the quadrants same way if they have too many points. • If the number of points in a square – fits in a block, it is a leaf node – no longer fits in a block, it becomes an interior node, four quadrants are its children.
  • 50. R-tree • Captures the spirit of B-tree for multidimensional data. • Represents a collection of regions by grouping them into a hierarchy of larger regions. • Data is divided into regions. • Interior node is corresponds to interior region – region can be of any shape • Rectangular is popular – Children corresponds to sub-regions.
  • 52. Bitmap indexes • Assume that records have permanent numbers • A bit-map index is a collection of bit vectors of length n, one for each value may appear in the field F. • The vector for value v has 1 in position i if the i’th record has v in field F, and it has 0 there if not. • Example for F and G fields: – (30, foo), (30,bar), (40, baz), (50, foo), (40, bar), (30, baz) – Bit index for F, each of 6 bits. For 30, it is 110001, for 40, it is 001010, and for 50, it is 000100. – Bit index for G also have three vectors. For foo it is, 100100, for bar it is 010010, and for baz it is 001001.
  • 53. Bit map indexes: Partial match • Bit maps allow answering of partial match queries quickly and efficiently. • Example: – Movie(title, year, length, studioName) – SELECT title – FROM Movie – WHERE studioName= ‘Disney’ and Year=1965; • If we have bitmap for studioName and year, then intersection or AND operation will give the result. • Bit vectors do not occupy much space.
  • 54. Bitmap indexes: range queries • Example: consider the gold jewelry data of twelve points – 1 (25,60), 2(45,60), 3(50,75), 4(50,100), 5(50,120), 6(70, 110), 7(85,140), 8(30,260), 9(25,400), 10(45,350), 11(50,275), 12(60,260) • Age has seven different values – 25(100000001000), 30(000000010000), 45(010000000100), 50(001110000010), 60(000000000001), 70(000001000000) – 85(000000100000) • Salary has 10 different values – 60(000000000000), 75(001000000000), 100(000100000000) – 110(000001000000), 120(000010000000), 140(000000100000) – 260(000000010001), 275(000000000010), 350(000000000100) – 400(000000001000),
  • 55. Example continued • Find the jewelry buyers with an age range 45-55 and salary in the range 100-200 • Find the bit vectors of for the age values in the range and take OR – 010000000100 (for 45) and 001110000010 (for 50) – Result: 011110000110 • Find the bit vectors of salaries between 100 and 200 thousand. – There are four: 100,110,120, and 140, their bitwaise OR is 000111100000 • Take AND of both bit vectors – 000110000000 – Find two records (50,100) and (50,120) are in the range.
  • 56. Compressed bitmaps • If number of different values is large, then number of 1 is rare. • Run-length coding is used – Sequence of 0’s followed by 1. – Example: 000101 is two runs, 3 and 1. the binary representation is 11 and 1. So it is decoded as 111. • To save space, the bitmap indexes tend to consist of vectors with very few 1’s are compressed using run-length coding.
  • 57. Finding bit vectors • Use any index technique to find the values. • From the values to bit vectors. • B-tree is a good choice.