Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Prochain SlideShare
Chargement dans…5
×

# Dependable Cardinality Forecast for XQuery

766 vues

Publié le

Publié dans : Formation, Technologie, Business
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Identifiez-vous pour voir les commentaires

• Soyez le premier à aimer ceci

### Dependable Cardinality Forecast for XQuery

1. 1. Dependable Cardinality Forecasts for XQuery Jens Teubner, ETH (formerly IBM Research) Torsten Grust, U T¨bingen (formerly TUM) u Sebastian Maneth, UNSW and NICTA Sherif Sakr, UNSW and NICTA c Systems Group — Department of Computer Science — ETH Z¨rich u August 26, 2008
2. 2. Cardinality Estimation for XQuery The feature-richness and semantics of the language make cardinality estimation for XQuery notoriously hard. for \$d in doc ("forecast.xml")/descendant::day let \$day := \$d/@t let \$ppcp := data (\$d/descendant::ppcp) return if (\$ppcp > 50) then ("rain likely on", \$day, "chance of precipitation:", \$ppcp) else ("no rain on", \$day) for iteration sequence construction conditionals (if-then-else) ... existential quantification (XPath is not a focus of this work.) August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 2
3. 3. Cardinality Estimation for XQuery The feature-richness and semantics of the language make cardinality estimation for XQuery notoriously hard. for \$d in doc ("forecast.xml")/descendant::day card1 = ? let \$day := \$d/@t let \$ppcp := data (\$d/descendant::ppcp) return if (\$ppcp > 50) card2 = ? then ("rain likely on", \$day, card3 = ? "chance of precipitation:", \$ppcp) else ("no rain on", \$day) card4 = ? for iteration sequence construction conditionals (if-then-else) ... existential quantification (XPath is not a focus of this work.) → Goal: Compute subexpression-level cardinalities cardi . August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 2
4. 4. Idea: Perform cardinality estimation on relational plan equivalents for XQuery. relational cardinality estimation (System R) + existing work on XPath estimation + histograms for value predicates = cardinality estimation for XQuery Build on Pathfinder’s XQuery-to-relational algebra compiler.1 → tuple count ≡ XQuery item count Cardinality information for each subexpression. 1 http://www.pathfinder-xquery.org/ August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 3
5. 5. πiter:outer,pos:pos1,item pos1: ord,pos outer Example (Plan details not of interest today.) 2 iter=inner · ∪ 4 1 3 πiter,pos:pos1,item for \$d in doc ("forecast.xml")/descendant::day πiter,pos:pos1,item pos1: ord,pos iter let \$day := \$d/@t pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, let \$ppcp := data (\$d/descendant::ppcp) · ∪ item:"rain..." · ∪ return 2 πiter,pos:1,ord:1, πiter,pos,item,ord:2 ∪· if (\$ppcp > 50) 3 item:"no rain..." πiter,pos:1,ord:3, item:"chance..." then ("rain likely on", \$day, πiter,pos,item,ord:2 πiter,pos,item,ord:4 "chance of precipitation:", \$ppcp) iter1=iter iter1=iter iter=iter1 else ("no rain on", \$day) 4 δ πiter σres res:(item,item1) Pathfinder compiles XQuery iter1=iter πiter1:iter, πiter1:iter,pos,item with arbitrary nesting. πiter1:iter, pos1:1,item1:50 item pos,item pos: item iter Maintain correspondence to πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) original query if back-end is πiter:inner,item πinner,outer:iter, ord:pos not relational. inner: iter,pos 1 pos: item iter Derive estimates based on an item:descendant::day(item) inference rule set. docitem πiter,item:"forecast.xml" iter 1
6. 6. Relational XQuery Cardinality Estimation Apply System R-style estimation to relational XQuery plans, e.g., Disjoint union: Cartesian product: · |q1 ∪ q2 | = |q1 | + |q2 | |q1 × q2 | = |q1 | · |q2 | Equi-join:    |q1 | · |q2 | if there are indexes on  max {|a| , |b| } both join columns,     idx idx  |q1 q2 | = |q1 | · |q2 | if there is only an index a=b |a|idx on column a,        |q1 | · |q2 | · 1/10 otherwise  |c|idx : Number of unique values in index on column c. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 5
7. 7. Relational XQuery Cardinality Estimation Apply System R-style estimation to relational XQuery plans, e.g., Disjoint union: Cartesian product: · |q1 ∪ q2 | = |q1 | + |q2 | |q1 × q2 | = |q1 | · |q2 | Equi-join:  |q1 | · |q2 | if there are indexes on    max {|a| , |b| }     idx idx both join columns, ?  |q1 q2 | = |q1 | · |q2 | if there is only an index a=b       |a|idx on column a, ?  |q1 | · |q2 | · 1/10 otherwise  Our joins typically operate over computed relations. |c|idx : Number of unique values in index on column c. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 5
8. 8. Abstract Domain Identiﬁers A simple form of data flow analysis provides the information needed. Introduce abstract domain identifiers α, β, . . . as placeholders for the active runtime domain for each column c. (Read c α as “column c contains values from domain α.”) Estimate the size α of each domain α, e.g.,2 dom a: b1 ,...,bn (q) ⊇ dom (q) ∪ aα ∧ α =! |q| . Identify inclusion relationships α β between domains, e.g., aα ∈ dom (q) ∧ aβ ∈ dom (σ··· (q)) ⇒ β α . 2 Operator a: b1 ,...,bn introduces a new key column (holding row numbers). August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 6
9. 9. Abstract Domain Identiﬁers Use abstract domain information for cardinality estimation. E.g., “foreign key” join: aα ∈ dom (q1 ) bβ ∈ dom (q2 ) α β |q1 | · |q2 | |q1 q2 | = a=b β Domain inclusion guarantees that each tuple in q1 finds (at least one) join partner in q2 . Other examples: |q1 q2 | = |q1 | − |q2 | if q2 is a subset of q1 . |q1 q2 | = 0 if q1 is a subset of q2 . |q1 q2 | = |q1 | if q1 and q2 are disjoint. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 7
10. 10. Interfacing with XPath—Projection Paths Track XPath navigation by means of projection paths3 a a b a b c c:child::*(b) 1 γa γb 1 γa b c d 2 γb 1 γa γc 1 γa γd 3 γd e 3 γd γe q1 q2 b⇒p ∈ path (q1 ) ⇒ c⇒p/child::* ∈ path (q2 ) Step operator makes XPath navigation explicit in relational plans (compiles to join on SQL back-ends). 3 A. Marian and J. Sim´ on. Projecting XML Documents. VLDB 2003. e August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 8
11. 11. Interfacing with XPath—Cardinality Inference bα bα a a b a b c c:child::*(b) 1 γa γb 1 γa b c d 2 γb 1 γa γc α α 1 γa γd 3 γd e 3 γd γe q1 q2 Cardinality: fn:count (p/child::*) |q2 | = |q1 | · fn:count (p) = |q1 | · Prchild::* (p) fanout (here: 4/3) Domain Sizes: fn:count (p [ child::* ]) α = α · fn:count (p) = α · Pr[child::*] (p) selectivity (here: 2/3) Any XPath estimator that provides Prp2 (p1 ) and Pr[p2 ] (p1 ) will do. Our prototype uses a simple Data Guide-based implementation. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 9
12. 12. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter δ item:descendant::ppcp(item) πiter σres πiter:inner,item res:(item,item1) iter1=iter πiter1:iter, πiter1:iter,pos,item pos,item πiter1:iter, pos1:1,item1:50 item pos: item iter πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos inner: iter,pos 1 pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
13. 13. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item pos: item iter πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos inner: iter,pos 1 pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
14. 14. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
15. 15. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ π iter1α iter1=iter “Foreign key” iter,pos:1,ord:1, ∪· join: item:"rain..." ∪· iterα q1 q2 rain..." α item:"no = πiter,pos,item,ord:2 πiter,pos:1,ord:1, |q1 |·|q2 | (since α ∪· α) πiter1:iter, πiter,pos:1,ord:3, iter1=iter item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
16. 16. πiter:outer,pos:pos1,item card2 : 4104 pos1: ord,pos outer Back to our Example Plan 2 iter=inner card3 : 3540 · ∪ 4 card4 : 564 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ π iter1α iter1=iter “Foreign key” iter,pos:1,ord:1, ∪· join: item:"rain..." ∪· iterα q1 q2 rain..." α item:"no = πiter,pos,item,ord:2 πiter,pos:1,ord:1, |q1 |·|q2 | (since α ∪· α) πiter1:iter, πiter,pos:1,ord:3, iter1=iter item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, card1 : 990 ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
17. 17. Forecasting New Zealand’s Weather The obtained cardinalities can be mapped back to predict item counts for corresponding XQuery expressions:4 for \$d in doc ("forecast.xml")/descendant::day card1 = 990/990 let \$day := \$d/@t let \$ppcp := data (\$d/descendant::ppcp) return if (\$ppcp > 50) card2 = 4104/3402 then ("rain likely on", \$day, card3 = 3540/2370 "chance of precipitation:", \$ppcp) else ("no rain on", \$day) card4 = 564/1032 Value statistics based on histograms ( paper) Inaccuracy is mainly due to correlations in the data. Rain in the morning likely means rain in the afternoon, too. 4 estimated/observed, based on data taken for New Zealand two weeks ago. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 11
18. 18. More Realistic Queries: W3C XQuery Use Cases 0.01 0.1 1 10 Prototype implementation XMP Q4 based on Pathfinder XMP Q5 XMP Q6 For each subexpression: XMP Q8 XMP Q9 estimated cardinality XMP Q10 XMP Q11 observed cardinality SGML Q3 SGML Q4 Diameter indicates data SGML Q8a point “stacking” SGML Q8b R Q2 Plan root: filled circle R Q3 R Q10 R Q11 Applicable to “real” queries R Q13 R Q14 Recovery from intermediate R Q15 mis-estimations R Q17 e.g., existential semantics 0.01 0.1 1 10 estimated cardinality/observed cardinality August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 12
19. 19. Wrap-Up Cardinality estimation framework for XQuery Subexpression-level estimates for arbitrary XQuery expressions Based on Pathfinder’s XQuery-to-relational algebra compiler relational cardinality estimation (System R) + existing work on XPath estimation + histograms for value predicates = cardinality estimation for XQuery High-quality estimates for realistic XQuery workloads Robust with respect to intermediate errors Pluggable and extensible e.g., XPath estimation subsystem, positional predicates August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 13