SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
1
Database Tuning, Spring 2008
Lecture 3:
Hash indexing,
index selection
Rasmus Pagh
2
Database Tuning, Spring 2008
Exercise left from last week
• B+-trees
– Questions a), b), and c).
3
Database Tuning, Spring 2008
Today’s lecture
• Morning session: Hashing
– Static hashing, hash functions
– Extendible hashing
– Linear hashing
– Newer techniques:
Buffering, two-choice hashing
• Afternoon session: Index selection
– Factors relevant for choice of indexes
– Rules of thumb; examples and counterexamples
– Exercises
4
Database Tuning, Spring 2008
What data in index?
• At least three possibilities:
1) Record of key (only primary index)
2) Key and pointer to record of key.
3) Key and list of pointers to the records
containing the key (for non-unique keys).
• For simplicity, we consider the case
where there is the same number of
keys (B) in every disk block.
– Case 1 with fixed length records.
– Case 2 with fixed length keys.
5
Database Tuning, Spring 2008
Static external hashing
• Hash table:
– Array of R disk blocks (≠notation in RG.)
– Can access block i in 1 I/O, for any i.
• Hash function h:
– Maps keys to {0,...,R-1}.
– Should be efficient to evaluate (0 I/Os).
– Idea: x is stored in block h(x).
• Problem:
– Dealing with overflows.
– Standard solution: Overflow chains.
6
Database Tuning, Spring 2008
Problem session
• Consider the following claim from RG:
• Donald Ummy uses this hash function
in an application, and finds out that it
performs terribly, no matter how the
constants a and b are chosen.
• What might have gone wrong?
7
Database Tuning, Spring 2008
Randomized hash functions
Another approach (not mentioned in RG):
• Choose h at random from some set of
functions.
• This can make the hashing scheme
behave well regardless of the key set.
• E.g., "universal hashing" makes
chained hashing perform well (in theory
and practice).
• Details out of scope for this course...
8
Database Tuning, Spring 2008
Analysis, static hashing
• Notation:
– R blocks in hash table
– Each block in the hash table can hold B keys.
• Suppose that we insert N=αR keys in the
hash table (”fraction α full”, “load factor α”).
• Assume h is truly random.
• Expected number of overflow blocks:
(1-α)-2 ⋅ 2-Ω(B) R (proof omitted!)
• Good to have many keys in each bucket (an
advantage of secondary indexes that store
only pointers to records).
• Should keep α away from 1. (How?)
9
Database Tuning, Spring 2008
Sometimes, life is easy
• If B is sufficiently large compared to N,
all overflow blocks can be kept in
internal memory.
• Lookup in 1 I/O.
• Update in 2 I/Os.
10
Database Tuning, Spring 2008
Too many overflow chains?
Can have too many overflow chains if:
• The hash function does not distribute
the set of keys well (”skew”).
– Solution 1: Choose a new hash function.
– Solution 2?: Overflow in main memory.
• The number of keys in the dictionary
exceeds the capacity of the hash table.
– Solution: Rehash to a larger hash table.
– Better solution: ?
• There are many duplicate values.
– No fix needed.
11
Database Tuning, Spring 2008
Doubling the hash table
• For simplicity, assume R is a power of
2. Suppose h is a hash function that
has values of ”many” (e.g. 64) bits.
• We can map a key x to {0,...,R-1} by
taking the log R least significant bits of
h(x).
• Suppose that the hash table has
become too small:
– Want to double the size of the hash table.
– Just consider one more bit of h(x).
12
Database Tuning, Spring 2008
Doubling the hash table, cont.
• Suppose h(x)=0111001 (in binary)
and the hash table has size 16.
• Then x is stored in block number 1001
(binary).
• After doubling to size 32, x should be
stored in block 11001.
• Generally, all keys in block 1001 should
be moved to block 01001 or 11001.
• Conclusion: Can rehash by scanning
the table and split each block into two
blocks.
13
Database Tuning, Spring 2008
Doubling, example
10111
00110
11110
00101
10100
00100
01011
11000
01011
10111
00110
11110
00101
10100
11000
New key:
00100
For simplicity we assume:
• No overflow chains
• h(x)=x
14
Database Tuning, Spring 2008
Problem session
• Find some possible disadvantages of
the doubling strategy. Consider:
– Space usage vs overflows
– System response time
• Next: Alternatives that address some
of the disadvantages of doubling.
15
Database Tuning, Spring 2008
Linear hashing
10111
00110
11110
00101
10100
00100
01011
11000
01011
10111
00110
11110
00101
10100
11000
10111
00110
11110
00101
10100
00100
01011
10111
00110
11110
11000
”Virtual” blocks
• Merged with previous blocks
by considering one bit less
• Turned into physical blocks
as the hash table grows
16
Database Tuning, Spring 2008
Linear hashing - performance
The good:
• Resizes hash table one block at a time:
Split a block or merge two blocks.
• Cost of resize: 3 I/Os. Cheap!
The bad:
• Increasing size of hash table may not
eliminate any overflow chain.
• Uneven distribution of hash values; works
best for relatively low load factors, 50-80%.
(But variants of linear hashing improve this.)
• No worst-case guarantee on query time.
17
Database Tuning, Spring 2008
Extendible hashing
10111
00110
11110
00101
10100
00100
01011
11000
”Virtual”
hash table
- no
overflows
01011
10111
00110
11110
00101
10100
00100
11000
physical
hash table
”Directory”
- mapping virtual
to physical
18
Database Tuning, Spring 2008
Extendible hashing invariants
• Virtual hash table has no overflows - may
need to increase in size.
• Physical hash table has no overflows.
• Virtual hash table is as small as possible -
may need to shrink.
• ”Compression”: For any bit string s, if we
consider the virtual hash table blocks whose
index ends with s then either:
– These blocks contain more than B keys, or
– The corresponding entries in the directory all point
to the same block. (In other words, these blocks are
merged.)
19
Database Tuning, Spring 2008
Extendible hashing performance
• At most 2 I/Os for every lookup.
• Only 1 I/O if directory fits in internal
memory.
• Space utilization in physical hash table
is 69% (expected).
• Size of directory is roughly
(expected) - this is much smaller than
the hash table if B is moderately large.
• Solution with better space usage (if
sufficient internal memory):
B-tree with only leaves on disk.
20
Database Tuning, Spring 2008
Buffering
• Same trick as in buffered B-trees:
Don’t do updates right away, but put
them in a buffer.
1000
0101
1100
0110
1110
1111
0100
overflow
block
0111
1010
buffer
• Advantage: Several keys moved to the
overflow block at once.
• Disadvantage: Buffer takes space.
• Details in [JensenPagh07].
21
Database Tuning, Spring 2008
Two-choice hashing
• Idea:
– Use two hash functions, h1 and h2.
– x is stored in either block h1(x) or h2(x),
use two I/Os for lookup.
– When inserting x, choose the least loaded
block among h1(x)and h2(x).
• Can be shown that overflow
probabilities are much smaller than
with one function, especially when B is
small.
• If two disks are available, the 2 I/Os
can be done in parallel.
22
Database Tuning, Spring 2008
Today’s lecture, part 2
• Index selection
– Factors relevant for choice of indexes
– Rules of thumb; examples and counterexamples
• Exercises
23
Database Tuning, Spring 2008
Workload
• The workload (mix of operations to be
carried out by the DBMS) has a large
influence on what indexes should be
created in a database.
• Other factors are:
– the data in relations, and
– the query plans produced by the DBMS.
24
Database Tuning, Spring 2008
Rules of thumb
• Rules of thumb can be used to guide
thinking, and as a checklist.
• Are often valid in most cases, but there
are always important exceptions.
• Quote from SB:
• You don’t yet have the entire picture
(query optimization, concurrency), but
we can start reasoning about rules
anyway.
25
Database Tuning, Spring 2008
Rule of thumb 1:
Index the most selective attribute
• Argument: Using an index on a
selective attribute will help reducing
the amount of data to consider.
• Example:
SELECT count(*) FROM R
WHERE a>’UXS’ AND b BETWEEN 100 AND 200
• Counterexamples:
– Full table scan may be faster than an index
– It may not be possible/best to apply an
index.
26
Database Tuning, Spring 2008
Rule of thumb 2:
Cluster the most important index of a relation
• Argument:
– Range and multipoint queries are faster.
– Usually sparse, uses less space.
• Counterexamples:
– May be slower on queries ”covered” by a
dense index. (More on this later.)
– If there are many updates, the cost of
maintaining the clustering may be high.
– Clustering does not help for point queries.
– Can cluster according to several attributes
by duplicating the relation!
27
Database Tuning, Spring 2008
Rule of thumb 3:
Prefer a hash index over a B-tree if point
queries are more important than range queries
• Argument:
– Hash index uses fewer I/Os per operation
than a B-tree.
– Joins, especially, can create many point
queries.
• Counterexamples:
– If a real-time guarantee is needed, hashing
can be a bad choice.
– Might be best to have both a B-tree and a
hash index.
28
Database Tuning, Spring 2008
Aside: Hashing and range queries
RG page 371:
• But: they can be used to answer range
queries in O(1+Z/B) I/Os, where Z is the
number of results. (Alstrup, Brodal, Rauhe, 2001;
Mortensen, Pagh, Patrascu 2005)
• Theoretical result on external memory
(why?) - and out of scope for DBT.
29
Database Tuning, Spring 2008
Problem session
• Setting:
– we have 220 tuples in a primary index
– tuples take the space of 4 keys,
– the space for a pointer is small compared
to the space of a key
– internal memory has space for M=216 keys.
• Consider the search time of B-trees and
extendible hashing two cases:
– Case A: B=4 (i.e., 4 tuples/block).
– Case B: B=26.
30
Database Tuning, Spring 2008
Rule of thumb 4:
Balance the increased cost of updating with
the decreased cost of searching
• Argument: The savings provided by an
index should be bigger than the cost.
• Counterexample:
– If updates come when the system has
excess capacity, we might be willing to
work harder to have indexes at the peaks.
• If buffered B-trees are used, the cost
per update of maintaining an index
may be rather low. Especially if small
degree trees are used.
31
Database Tuning, Spring 2008
Rule of thumb 5:
A non-clustering index helps when the
number of rows to retrieve is smaller than the
number of blocks in the relation.
• Argument:In this case it surely reduces
I/O cost.
• Counterexample:
– Even for a non-clustered index, the rows to
retrieve can sometimes be found in a small
fraction of the blocks (e.g. salary, clustered
on date of employment).
32
Database Tuning, Spring 2008
Rule of thumb 6:
Avoid indexing of small tables.
• Argument: Small tables can be kept in
internal memory, or read entirely in 1
or 2 I/Os.
• Counterexample:
– If the index is in main memory, it might
still give a speedup.
33
Database Tuning, Spring 2008
Rule of thumb 7:
A covering index for a query will speed it up
• Argument: The index will contain less
data than the base table, allowing a
faster scan of all data needed.
• Counterexamples:
– If the table is vertically partitioned, a
similar speedup can be achieved.
– A vertically partitioned relation may have
several indexes that can be used to answer
the query (e.g. an index to select and an
index to join).
34
Database Tuning, Spring 2008
Conclusion
• Indexing is a complicated business!
• Understanding the various index types
and their performance characteristics,
as well as the characteristics of the
database at hand and its workload
allows informed indexing decisions.
• Rules of thumb can be used to guide
thinking.
• More complications to come!
35
Database Tuning, Spring 2008
Tip: Clustered indexing in Oracle
• To cluster according to a non-unique
attribute A, declare a composite
primary key (A,P), where P is a unique
key.
36
Database Tuning, Spring 2008
Tip: Hash indexing in Oracle
37
Database Tuning, Spring 2008
Exercises
Hand-outs:
• Choosing an index.
– Questions a), b), and c).
• Representation of relations
– Question d
(on handout from last week).

Contenu connexe

Similaire à 02-hashing.pdf

Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBAntonios Giannopoulos
 
Approximate methods for scalable data mining (long version)
Approximate methods for scalable data mining (long version)Approximate methods for scalable data mining (long version)
Approximate methods for scalable data mining (long version)Andrew Clegg
 
Advanced data structures vol. 1
Advanced data structures   vol. 1Advanced data structures   vol. 1
Advanced data structures vol. 1Christalin Nelson
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedTony Rogerson
 
Advance computer architecture
Advance computer architectureAdvance computer architecture
Advance computer architecturesuma1991
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithmsSandeep Joshi
 
Tuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBaseTuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBaseAnil Gupta
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
 
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Yandex
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovNikolay Samokhvalov
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Cloudera, Inc.
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and FastJulian Hyde
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Aman Sinha
 
Algorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAlgorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAtner Yegorov
 
Algorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAlgorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisHiye Biniam
 

Similaire à 02-hashing.pdf (20)

Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDB
 
Approximate methods for scalable data mining (long version)
Approximate methods for scalable data mining (long version)Approximate methods for scalable data mining (long version)
Approximate methods for scalable data mining (long version)
 
Advanced data structures vol. 1
Advanced data structures   vol. 1Advanced data structures   vol. 1
Advanced data structures vol. 1
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - Advanced
 
Advance computer architecture
Advance computer architectureAdvance computer architecture
Advance computer architecture
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithms
 
Tuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBaseTuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBase
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
Deep Dive - DynamoDB
Deep Dive - DynamoDBDeep Dive - DynamoDB
Deep Dive - DynamoDB
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
Algorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAlgorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysis
 
Algorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAlgorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysis
 

Dernier

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Dernier (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

02-hashing.pdf

  • 1. 1 Database Tuning, Spring 2008 Lecture 3: Hash indexing, index selection Rasmus Pagh
  • 2. 2 Database Tuning, Spring 2008 Exercise left from last week • B+-trees – Questions a), b), and c).
  • 3. 3 Database Tuning, Spring 2008 Today’s lecture • Morning session: Hashing – Static hashing, hash functions – Extendible hashing – Linear hashing – Newer techniques: Buffering, two-choice hashing • Afternoon session: Index selection – Factors relevant for choice of indexes – Rules of thumb; examples and counterexamples – Exercises
  • 4. 4 Database Tuning, Spring 2008 What data in index? • At least three possibilities: 1) Record of key (only primary index) 2) Key and pointer to record of key. 3) Key and list of pointers to the records containing the key (for non-unique keys). • For simplicity, we consider the case where there is the same number of keys (B) in every disk block. – Case 1 with fixed length records. – Case 2 with fixed length keys.
  • 5. 5 Database Tuning, Spring 2008 Static external hashing • Hash table: – Array of R disk blocks (≠notation in RG.) – Can access block i in 1 I/O, for any i. • Hash function h: – Maps keys to {0,...,R-1}. – Should be efficient to evaluate (0 I/Os). – Idea: x is stored in block h(x). • Problem: – Dealing with overflows. – Standard solution: Overflow chains.
  • 6. 6 Database Tuning, Spring 2008 Problem session • Consider the following claim from RG: • Donald Ummy uses this hash function in an application, and finds out that it performs terribly, no matter how the constants a and b are chosen. • What might have gone wrong?
  • 7. 7 Database Tuning, Spring 2008 Randomized hash functions Another approach (not mentioned in RG): • Choose h at random from some set of functions. • This can make the hashing scheme behave well regardless of the key set. • E.g., "universal hashing" makes chained hashing perform well (in theory and practice). • Details out of scope for this course...
  • 8. 8 Database Tuning, Spring 2008 Analysis, static hashing • Notation: – R blocks in hash table – Each block in the hash table can hold B keys. • Suppose that we insert N=αR keys in the hash table (”fraction α full”, “load factor α”). • Assume h is truly random. • Expected number of overflow blocks: (1-α)-2 ⋅ 2-Ω(B) R (proof omitted!) • Good to have many keys in each bucket (an advantage of secondary indexes that store only pointers to records). • Should keep α away from 1. (How?)
  • 9. 9 Database Tuning, Spring 2008 Sometimes, life is easy • If B is sufficiently large compared to N, all overflow blocks can be kept in internal memory. • Lookup in 1 I/O. • Update in 2 I/Os.
  • 10. 10 Database Tuning, Spring 2008 Too many overflow chains? Can have too many overflow chains if: • The hash function does not distribute the set of keys well (”skew”). – Solution 1: Choose a new hash function. – Solution 2?: Overflow in main memory. • The number of keys in the dictionary exceeds the capacity of the hash table. – Solution: Rehash to a larger hash table. – Better solution: ? • There are many duplicate values. – No fix needed.
  • 11. 11 Database Tuning, Spring 2008 Doubling the hash table • For simplicity, assume R is a power of 2. Suppose h is a hash function that has values of ”many” (e.g. 64) bits. • We can map a key x to {0,...,R-1} by taking the log R least significant bits of h(x). • Suppose that the hash table has become too small: – Want to double the size of the hash table. – Just consider one more bit of h(x).
  • 12. 12 Database Tuning, Spring 2008 Doubling the hash table, cont. • Suppose h(x)=0111001 (in binary) and the hash table has size 16. • Then x is stored in block number 1001 (binary). • After doubling to size 32, x should be stored in block 11001. • Generally, all keys in block 1001 should be moved to block 01001 or 11001. • Conclusion: Can rehash by scanning the table and split each block into two blocks.
  • 13. 13 Database Tuning, Spring 2008 Doubling, example 10111 00110 11110 00101 10100 00100 01011 11000 01011 10111 00110 11110 00101 10100 11000 New key: 00100 For simplicity we assume: • No overflow chains • h(x)=x
  • 14. 14 Database Tuning, Spring 2008 Problem session • Find some possible disadvantages of the doubling strategy. Consider: – Space usage vs overflows – System response time • Next: Alternatives that address some of the disadvantages of doubling.
  • 15. 15 Database Tuning, Spring 2008 Linear hashing 10111 00110 11110 00101 10100 00100 01011 11000 01011 10111 00110 11110 00101 10100 11000 10111 00110 11110 00101 10100 00100 01011 10111 00110 11110 11000 ”Virtual” blocks • Merged with previous blocks by considering one bit less • Turned into physical blocks as the hash table grows
  • 16. 16 Database Tuning, Spring 2008 Linear hashing - performance The good: • Resizes hash table one block at a time: Split a block or merge two blocks. • Cost of resize: 3 I/Os. Cheap! The bad: • Increasing size of hash table may not eliminate any overflow chain. • Uneven distribution of hash values; works best for relatively low load factors, 50-80%. (But variants of linear hashing improve this.) • No worst-case guarantee on query time.
  • 17. 17 Database Tuning, Spring 2008 Extendible hashing 10111 00110 11110 00101 10100 00100 01011 11000 ”Virtual” hash table - no overflows 01011 10111 00110 11110 00101 10100 00100 11000 physical hash table ”Directory” - mapping virtual to physical
  • 18. 18 Database Tuning, Spring 2008 Extendible hashing invariants • Virtual hash table has no overflows - may need to increase in size. • Physical hash table has no overflows. • Virtual hash table is as small as possible - may need to shrink. • ”Compression”: For any bit string s, if we consider the virtual hash table blocks whose index ends with s then either: – These blocks contain more than B keys, or – The corresponding entries in the directory all point to the same block. (In other words, these blocks are merged.)
  • 19. 19 Database Tuning, Spring 2008 Extendible hashing performance • At most 2 I/Os for every lookup. • Only 1 I/O if directory fits in internal memory. • Space utilization in physical hash table is 69% (expected). • Size of directory is roughly (expected) - this is much smaller than the hash table if B is moderately large. • Solution with better space usage (if sufficient internal memory): B-tree with only leaves on disk.
  • 20. 20 Database Tuning, Spring 2008 Buffering • Same trick as in buffered B-trees: Don’t do updates right away, but put them in a buffer. 1000 0101 1100 0110 1110 1111 0100 overflow block 0111 1010 buffer • Advantage: Several keys moved to the overflow block at once. • Disadvantage: Buffer takes space. • Details in [JensenPagh07].
  • 21. 21 Database Tuning, Spring 2008 Two-choice hashing • Idea: – Use two hash functions, h1 and h2. – x is stored in either block h1(x) or h2(x), use two I/Os for lookup. – When inserting x, choose the least loaded block among h1(x)and h2(x). • Can be shown that overflow probabilities are much smaller than with one function, especially when B is small. • If two disks are available, the 2 I/Os can be done in parallel.
  • 22. 22 Database Tuning, Spring 2008 Today’s lecture, part 2 • Index selection – Factors relevant for choice of indexes – Rules of thumb; examples and counterexamples • Exercises
  • 23. 23 Database Tuning, Spring 2008 Workload • The workload (mix of operations to be carried out by the DBMS) has a large influence on what indexes should be created in a database. • Other factors are: – the data in relations, and – the query plans produced by the DBMS.
  • 24. 24 Database Tuning, Spring 2008 Rules of thumb • Rules of thumb can be used to guide thinking, and as a checklist. • Are often valid in most cases, but there are always important exceptions. • Quote from SB: • You don’t yet have the entire picture (query optimization, concurrency), but we can start reasoning about rules anyway.
  • 25. 25 Database Tuning, Spring 2008 Rule of thumb 1: Index the most selective attribute • Argument: Using an index on a selective attribute will help reducing the amount of data to consider. • Example: SELECT count(*) FROM R WHERE a>’UXS’ AND b BETWEEN 100 AND 200 • Counterexamples: – Full table scan may be faster than an index – It may not be possible/best to apply an index.
  • 26. 26 Database Tuning, Spring 2008 Rule of thumb 2: Cluster the most important index of a relation • Argument: – Range and multipoint queries are faster. – Usually sparse, uses less space. • Counterexamples: – May be slower on queries ”covered” by a dense index. (More on this later.) – If there are many updates, the cost of maintaining the clustering may be high. – Clustering does not help for point queries. – Can cluster according to several attributes by duplicating the relation!
  • 27. 27 Database Tuning, Spring 2008 Rule of thumb 3: Prefer a hash index over a B-tree if point queries are more important than range queries • Argument: – Hash index uses fewer I/Os per operation than a B-tree. – Joins, especially, can create many point queries. • Counterexamples: – If a real-time guarantee is needed, hashing can be a bad choice. – Might be best to have both a B-tree and a hash index.
  • 28. 28 Database Tuning, Spring 2008 Aside: Hashing and range queries RG page 371: • But: they can be used to answer range queries in O(1+Z/B) I/Os, where Z is the number of results. (Alstrup, Brodal, Rauhe, 2001; Mortensen, Pagh, Patrascu 2005) • Theoretical result on external memory (why?) - and out of scope for DBT.
  • 29. 29 Database Tuning, Spring 2008 Problem session • Setting: – we have 220 tuples in a primary index – tuples take the space of 4 keys, – the space for a pointer is small compared to the space of a key – internal memory has space for M=216 keys. • Consider the search time of B-trees and extendible hashing two cases: – Case A: B=4 (i.e., 4 tuples/block). – Case B: B=26.
  • 30. 30 Database Tuning, Spring 2008 Rule of thumb 4: Balance the increased cost of updating with the decreased cost of searching • Argument: The savings provided by an index should be bigger than the cost. • Counterexample: – If updates come when the system has excess capacity, we might be willing to work harder to have indexes at the peaks. • If buffered B-trees are used, the cost per update of maintaining an index may be rather low. Especially if small degree trees are used.
  • 31. 31 Database Tuning, Spring 2008 Rule of thumb 5: A non-clustering index helps when the number of rows to retrieve is smaller than the number of blocks in the relation. • Argument:In this case it surely reduces I/O cost. • Counterexample: – Even for a non-clustered index, the rows to retrieve can sometimes be found in a small fraction of the blocks (e.g. salary, clustered on date of employment).
  • 32. 32 Database Tuning, Spring 2008 Rule of thumb 6: Avoid indexing of small tables. • Argument: Small tables can be kept in internal memory, or read entirely in 1 or 2 I/Os. • Counterexample: – If the index is in main memory, it might still give a speedup.
  • 33. 33 Database Tuning, Spring 2008 Rule of thumb 7: A covering index for a query will speed it up • Argument: The index will contain less data than the base table, allowing a faster scan of all data needed. • Counterexamples: – If the table is vertically partitioned, a similar speedup can be achieved. – A vertically partitioned relation may have several indexes that can be used to answer the query (e.g. an index to select and an index to join).
  • 34. 34 Database Tuning, Spring 2008 Conclusion • Indexing is a complicated business! • Understanding the various index types and their performance characteristics, as well as the characteristics of the database at hand and its workload allows informed indexing decisions. • Rules of thumb can be used to guide thinking. • More complications to come!
  • 35. 35 Database Tuning, Spring 2008 Tip: Clustered indexing in Oracle • To cluster according to a non-unique attribute A, declare a composite primary key (A,P), where P is a unique key.
  • 36. 36 Database Tuning, Spring 2008 Tip: Hash indexing in Oracle
  • 37. 37 Database Tuning, Spring 2008 Exercises Hand-outs: • Choosing an index. – Questions a), b), and c). • Representation of relations – Question d (on handout from last week).