Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Efficient Query Processing Infrastructures

1 651 vues

Publié le

SIGIR 2018 tutorial

Publié dans : Sciences
  • Get access to 16,000 woodworking plans, Download 50 FREE Plans... ◆◆◆ http://tinyurl.com/y3hc8gpw
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Efficient Query Processing Infrastructures

  1. 1. Efficient Query Processing Infrastructures Craig Macdonald 1, Nicola Tonellotto 2 1 School of Computer Science University of Glasgow 2 Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” National Research Council of Italy
  2. 2. The scale of the web search challenge
  3. 3. How many documents? In how long? • Reports suggest that Google considers a total of 30 trillion pages in the indexes of its search engine – And it identified relevant results from these 30 trillion in 0.63 seconds – Clearly this a big data problem! • To answer a user's query, a search engine doesn’t read through all of those pages: the index data structures help it to efficiently find pages that effectively match the query and will help the user – Effective: users want relevant search results – Efficient: users aren't prepared to wait a long time for search results • Today, we provide an insight into search engine architecture and the modern retrieval techniques for attaining efficient yet effective search
  4. 4. Search as a Distributed Problem • To achieve efficiency at Big Data scale, search engines use many servers: • N & M can be very big: – Microsoft's Bing search engine has "hundreds of thousands of query servers" Shard Replica Query Server Retrieval Strategy Shard Replica Query Server Retrieval Strategy Broker Query Scheduler queries N M Results Merging
  5. 5. Image Source: http://wallpapercave.com/w/NzPVzSl 5
  6. 6. Image Source: http://www.coloandcloud.com/editorial/quincy-wa-big-data-centers-leverage-abundant-inexpensive-renewable-energy/ QUINCY WA, USA
  7. 7. Image Source: http://www.coloandcloud.com/editorial/quincy-wa-big-data-centers-leverage-abundant-inexpensive- renewable-energy/
  8. 8. Data centres use 3 per cent of the global electricity supply and accounting for about 2 per cent of total greenhouse gas emissions This is expected to treble in the next decade, putting an enormous strain on energy supplies and dealing a hefty blow to efforts to contain global warming Data Centres are energy hungry… Image Source: http://www.mwhglobal.com/project/wanapum-dam-spillway-project/Source: The Independent
  9. 9. How to keep pace? IncomeExpenditure More users, more ad revenue More queries received More documents to index Search engines are growing 9
  10. 10. LessEffectiveMoreEffective More Query ServersMore Efficient Querying Expenditure Income Expenditure Income
  11. 11. Green search engines are important Keep your search engine Efficient yet effective Efficient: More throughput, less servers, less expenditure Effective: More users, more income Income 11 Expenditure
  12. 12. So where is our focus?
  13. 13. Inter Data Center So where is our focus?
  14. 14. Intra Data Center So where is our focus?
  15. 15. Intra Server So where is our focus?
  16. 16. • Different formulations are more effective & efficient for different queries: – Homepage: match on Url/Title/Atext – Informational: match on Body, more query expansion • Query type classification, query performance predictors and query efficiency predictors all have a role in deciding the most appropriate plan for a query Query Plans: Executing a Query 16 Q Qplan Q Q plan halloween costumes (halloween ∊ U|B|T) ⋀ ( {costume,costumes} ∊ A|U|B|T) facebook login (facebook ∊ U|B|T) Optimizing Query Evaluations using Reinforcement Learning for Web Search. Rosset et al, Proceedings of SIGIR 2018.
  17. 17. • Typically, in web-scale search, the ranking process can be conceptually seen as a series of cascades [1] – Rank some documents – Pass top-ranked onto next cascade for refined re-ranking 1,000,000,000s of documents 1000s of documents Cascades 17 20 docs Boolean: Do query terms occur? Simple Ranking: Identify a set most likely to contain relevant documents Re-Ranking: Try really hard to get the top of the ranking correct, using many signals (features) LEARNING TO RANK e.g. BM25 [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. e.g. AND/OR Q Qplan
  18. 18. 1,000,000,000s of documents 1000s of documents Roadmap 18 20 docs LEARNING TO RANK e.g. BM25 [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. e.g. AND/ORPart 1: Index layouts Part 2: Query evaluation & dynamic pruning Part 3: Learning-to-rank Part 4: Query Efficiency Prediction & Applications
  19. 19. Aims of this Tutorial • ILO 1. learn about the fundamentals of query processing in information retrieval in terms of the data structures and algorithms involved (e.g. Document-at-a-Time/Term-at-a-time/Score-at-a- time); • ILO 2. learn about the state-of-the-art of dynamic pruning (query evaluation) techniques (e.g., TAAT optimizations, impacts, MaxScore, WAND, blockmax indexes, top-hitlists), illustrated through extensive use of pseudo-code and examples. • ILO 3. understand how such query processing fits into a multi-tier search architecture, involving learning-to-rank and distributed query processing. • ILO 4. understand about safeness in dynamic pruning techniques, and about the nature of the tradeoffs between efficiency and effectiveness in query processing. • ILO 5. learn about query efficiency prediction and its applications to efficient yet effective web search and to operational costs reduction, including a description of Bing’s use of QEP. 19SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  20. 20. Outline • 1330-1340 Introduction (Craig) (10 mins) – Components, multi-stage architectures • 1340-1410 Part 1: Data structures, compression & skipping (Craig) (30 mins) • 1410-1500 Part 2: Query Evaluation (50 mins) – TAAT/DAAT (Craig) – dynamic pruning, impacts, SAAT (Nicola) • 1500-1530 Coffee Break (30 mins) • 1530-1625 Part 3: Second Stage Query Processing, aka LTR – Learning to rank (Craig) (40 min) – Quickscorer (Nicola) (15 mins) • 1625-1650 Part 4: Query Efficiency Prediction (25 minutes) • 1650-1700 Conclusions (5-10 minutes) SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 20
  21. 21. PART 1: INDEX LAYOUT & COMPRESSION 21SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  22. 22. The Format of an Index term df cf p Lexicon id len DocumentIndex InvertedIndexid tf id tf each entry (posting) represents a document Index Term Pipeline Indexing Document Tokenizer 22 • An index normally contains three sub- structures • Lexicon: Records the list of all unique terms and their statistics • Document Index: Records the list of all documents and their statistics • Inverted Index: Records the mapping between terms and documents id tf Could also contain other occurrence information: e.g. term positions, fields (title, URL) Could also contain other document information: e.g. PageRank
  23. 23. Search Efficiency • It is important to make retrieval as fast as possible • Research by indicates that even slightly slower retrieval (0.2s-0.4s) can lead to a dramatic drop in the perceived quality of the results [1] • So what is the most costly part of a (classical) search system? • Scoring each document for the user query 23[1] Teevan et al. Slow Search: Information Retrieval without Time Constraints. HCIR’13 Term Pipeline Document Retrieval Model Query Tokenizer Re-Ranking Index Top Results
  24. 24. Why is Document Scoring Expensive? • The largest reason for the expense of document scoring is that there are lots of documents: – A Web search index can contain billions of documents • Google currently indexes trillions of pages [1] • More specifically, the cost of a search is dependent on: – Query length (the number of search terms) – Posting list length for each query term • i.e. The number of documents containing each term 24 term1 df cf p term2 df cf p id tf id tf id tf id tf id tf [1] http://www.statisticbrain.com/total-number-of-pages-indexed-by-google/
  25. 25. INDEX COMPRESSION 25
  26. 26. Index Compression • The compression of the inverted index posting lists is essential for efficient scoring [1] – Motivation: it physically takes time to read the term posting lists, particularly if they are stored on a (slow) hard disk – Using lossless compressed layouts for the term posting lists can save space (on disk or in memory) and reduce the amount of time spent doing IO – But decompression can also be expensive, so efficient decompression is key! 26 term2 21 2 25 2 1 integer = 32 bits = 4 bytes total = 24 bytes Do we need 32 bits? 26 5 [1] Witten et al. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann 1999.
  27. 27. Delta Gaps • One component of the information stored in a posting list is the docids… – …in ascending order! • We can make smaller numbers by taking the differences • So each docid in the posting lists could be represented using less bits – How to represent these numbers? – 32 bits has a range -2147483648 .. 2147483648 – Using a fixed number of bits is wasteful 27 term2 21 2 25 2 26 5 term2 21 2 25 24 26 51
  28. 28. • Unary: – Use as many 0s as the input value x, followed by a 1 – Eg: 5 is 000001 • Gamma: – Let N=⌊log2 x⌋ be the highest power of 2 that x contains; – Write N out in unary representation, followed by the remainder (x – 2N) in binary – Eg: 5 is represented as 00101 • Let’s represent docids as gamma, and tf as unary Elias Unary & Gamma Encoding 28 term2 21 2 25 24 26 51 Repr. 0000 1 0101 001 00100 001 1 00001 Bits 9 3 5 3 1 5 = 21 bits < 3 bytes down from 24! 3.5 bits/int
  29. 29. Other Compressions Schemes • Elias Gamma & Elias Unary are moderately expensive to decode: lots of bit twiddling! – Other schemes are byte-aligned, e.g. • Variable byte [1] • Simple family [2] • Documents are often clustered in the inverted index (e.g. by URL ordering) – Compression can be more effective in blocks of numbers – List-adaptive techniques work on blocks of numbers • Frame of reference (FOR) [3] • Patched frame of reference (PFOR) [4] 29 [1] H.E. Williams and J. Zobel. Compressing Integers for Fast File Access. Comput. J. 1999 [2] V. Anh & A. Moffat. Inverted Index Compression using Word-aligned Binary Codes. INRT. 2005 [3] J. Goldstein et al. Compressing relations and indexes. ICDE 1998. [4] M. Zukowski et al. Super-scalar RAM-CPU cache compression. ICDE 2006.
  30. 30. Frame of Reference Idea: pick the minimum m and the maximum M values of block of numbers that you are compressing. • Then, any value x can be represented using b bits, where b = ⌈log2(M-m+1)⌉. Example: To compress numbers in range {2000,...,2063} – ⌈log2(64) ⌉ = 6 – So we use 6 bits per value: 2000 2 6 2 xxxxxxxxxxxxxxxxxx... 37
  31. 31. 31 Patched frame of reference (PFOR) ● FOR is sensitive to outliers – Eg: to compress {2000,...,9000,...2064} we now need 13 bits per value ● Idea: pick b bits for compression such as it is “good” for most of the values, treat the others as exceptions (32 bits each) Zukowski et al, ICDE’06
  32. 32. 32 PFOR flavors ● NewPFD – compresses exceptions using a Simple family codec ● OptPFD – tries to optimize b to have the best compression ratio and decompression speed ● FastPFOR – subdivides exceptions by # used bits, then compresses these sets using a FOR codec NewPFD and OptPFD in Yan et al., WWW'09 FastPFOR in Lemire, Boytsov; Software: Practice and experience (2013)
  33. 33. Compression: Some numbers [1] • ClueWeb09 corpus – 50 million Web documents – 12,725,738,385 postings => 94.8GB inverted file uncompressed – NO retrieval numbers: WAY TOO SLOW! – Terrier’s standard Elias Gamma/Unary compression = 15GB • Compression is essential for an efficient IR system – List adaptive compression: slightly larger indices, markedly faster 33 Time (s) Size Time (s) Size docids tfs Gamma/Unary 1.55 - 1.55 - Variable Byte +0.6% +5% +9% +18.4% Simple16 -7.1% -0.2% -2.6% +0.7% FOR -9.7% +1.3% -3.2% +4.1% PForDelta -7.7% +1.2% -1.3% +3.3% [1] M Catena, C Macdonald, and I Ounis. On Inverted Index Compression for Search Engine Efficiency. ECIR 2014.
  34. 34. Most Recently… Elias Fano • A data structure from the ’70s (mostly in the succinct data structures niche) for encoding of monotonically increasing sequences of integers • Recently successfully applied to inverted indexes [Vigna, WSDM13] – Used by Facebook Graph Search! • Originally distribution-independent, but recently adapted to take into account document similarities, i.e., common terms: partitioned Elias-Fano [Ottaviano & Venturini, SIGIR15] • Very fast implementations exist – Skipping is included… 34SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  35. 35. Posting List Iterators • It is often convenient to see a posting list as an iterator over its postings. • APIs: – p.docid() returns the docid of the current posting – p.score() returns the score of the current posting – p.next() moves sequentially the iterator to the next posting – p.next(d) advances the iterator forward to the next posting with a document identifier greater than or equal to d. => skipping 35SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  36. 36. Skipping (I) d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 To move the cursor to a specific docid, we need to read and decompress all postings in the middle current posting next posting
  37. 37. Skipping (II) d1 d10d2 d4 d5 d6 d7 d8 d9d3 We append an additional structure to the start of the posting list, such that the largest docid in any block of compressed postings is known Then to move the cursor to a specific docid, we need to read and decompress skips in the middle and few postings current posting next posting d11 d12
  38. 38. Essential Bibliography • Matteo Catena, Craig Macdonald and Iadh Ounis. On Inverted Index Compression for Search Engine Efficiency. In Proceedings of ECIR 2014. • Witten, I.H., Bell, T.C., Moffat, A.: Managing Gigabytes: Compressing and Indexing Documents and Images. 1st edn. (1994) • Elias, P.: Universal codeword sets and representations of the integers. Trans. Info. Theory 21(2) (1975) 2014. • Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Software: Practice and Experience (2013) • Delbru, R., Campinas, S., Samp, K., Tummarello, G.: Adaptive frame of reference for compressing inverted lists. Technical Report 2010-12-16, DERI (2010) • Williams, H.E., Zobel, J.: Compressing integers for fast file access. The Computer Journal 42 (1999) • Scholer, F., Williams, H.E., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Proc. SIGIR ’02. (2002) • Anh, V.N., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf. Retr. 8(1) (2005) • Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proc. ICDE ’98. (1998) • Alistair Moffat and Justin Zobel. 1996. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14, 4 (October 1996), 349-379. • Sebastiano Vigna. 2013. Quasi-succinct indices. In Proceedings of the sixth ACM international conference on Web search and data mining (WSDM '13) • Giuseppe Ottaviano, and Rossano Venturini. "Partitioned elias-fano indexes." Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 2014. 38SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  39. 39. PART 2: QUERY EVALUATION 39
  40. 40. 1,000,000,000s of documents 1000s of documents Roadmap 40 20 docs LEARNING TO RANK e.g. BM25 [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. e.g. AND/ORPart 1: Index layouts Part 2: Query evaluation & dynamic pruning Part 3: Learning-to-rank Part 4: Query Efficiency Prediction & Applications
  41. 41. Query Evaluation • Conjunctive (AND) vs. Disjunctive (OR) • Boolean retrieval – Suitable for small/medium collections – Returns all documents matching the query • Ranked retrieval – Suitable for Web-scale collections – Requires a similarity function between queries and documents – Returns only the top k document SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 41
  42. 42. Query Evaluation • Normal strategies make a pass on the postings lists for each query term – This can be done Term-at-a-Time (TAAT) – one query term at a time – Or Document-at-a-time (DAAT) – all query terms in parallel • We will explain these, before showing how we can improve them 42
  43. 43. Term-at-a-Time (TAAT) 43 rank docid score 1 2 … term1 p term2 p 21 2 21 2 25 2 Advantages: • Simple Disadvantages: • Requires lots of memory to contain partial scores for all documents • Difficult to do Boolean or phrase queries, as we don’t have a document’s postings for all query terms at once 21 221 4 25 2
  44. 44. Document-at-a-Time (DAAT) 44 term1 p term2 p 21 2 21 2 25 2 Advantages: • Reduced memory compared to TAAT (and hence faster) • Supports Boolean query operators, phrases, etc. Disadvantages: • Slightly more complex to implement Most commercial search engines are reported to use DAAT rank docid score 1 2 … 21 4 25 2
  45. 45. TAAT vs DAAT • TAAT and DAAT have been the cornerstores of query evaluation in IR systems since 1970s. • The plain implementations of these two strategies are seldom used anymore, since many optimizations have been proposed during the years • Several known systems in production today, from large scale search engines as Google and Yahoo!, to open source text indexing packages as Lucene and Lemur, use some optimized variation of these strategies 45SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  46. 46. TAAT vs DAAT • [Turtle and Flood,1995] were the first to argue that DAAT could beat TAAT in practical environments. – For large corpora, DAAT implementations require a smaller run-time memory footprint 46 • [Fontoura et al., 2011] report experiments on small (200k docs) and large indexes (3M docs), with short (4.26 terms) and long (57.76 terms) queries (times are in microseconds): SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Source: [Fontoura et al, 2011]
  47. 47. Bibliography 47 [Heaps, 1978] Information Retrieval: Computational and Theoretical Aspects. Academic Press, USA, 1978. [Buckley and Lewit, 1985] Optimization of inverted vector searches. In Proc. SIGIR, pages 97–110, 1985. ACM. [Turtle and Flood, 1995] Query evaluation: strategies and optimizations. IPM, 31(6):831–850, 1995. [Moffat and Zobel, 1996] Self-indexing inverted files for fast text retrieval. ACM TOIS, 14(4):349–379, 1996. [Kaszkiel and Zobel, 1998] Term-ordered query evaluation versus document-ordered query evaluation for large document databases. In Proc. SIGIR,, pages 343– 344, 1998. ACM. [Kaszkiel et al., 1999] Efficient passage ranking for document databases. ACM TOIS, 17(4):406–439, 1999. [Broder et al., 2003] Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426–434, 2003. ACM. [Zobel and Moffat, 2006] Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006. [Culpepper and Moffat, 2010] Efficient set intersection for inverted indexing. ACM TOIS, 29(1):1:1– 1:25, 2010. [Fontoura et al., 2011] Evaluation strategies for top-k queries over memory-resident inverted indexes. Proc. of VLDB Endowment, 4(12): 1213–1224, 2011. SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  48. 48. FASTER QUERY EVALUATION Dynamic Pruning 48SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  49. 49. Query Processing • Breakdown • Pre-process the query (e.g., tokenisation, stemming) • Lookup the statistics for each term in the lexicon • Process the postings for each query term, computing scores for documents to identify the final retrieved set • Output the retrieved set with metadata (e.g., URLs) • What takes time? • Number of query terms – Longer queries have more terms with posting lists to process • Length of posting lists – More postings takes longer times • Goal: avoid (unnecessary) scoring of posting 49SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Term Pipeline Document Retrieval Model Query Tokenizer Re-Ranking Index Top Results
  50. 50. Dynamic Pruning during Query Evaluation • Dynamic pruning strategies aim to make scoring faster by only scoring a subset of the documents – The core assumption of these approaches is that the user is only interested in the top K results, say K=20 – During query scoring, it is possible to determine if a document cannot make the top K ranked results – Hence, the scoring of such documents can be terminated early, or skipped entirely, without damaging retrieval effectiveness to rank K • We call this “safe-to-rank K” • Dynamic pruning is based upon – Early termination – Comparing upper bounds on retrieval scores with thresholds 50
  51. 51. Dynamic Pruning Techniques • MaxScore [1] – Early termination: does not compute scores for documents that won’t be retrieved by comparing upper bounds with a score threshold • WAND [2] – Approximate evaluation: does not consider documents with approximate scores (sum of upper bounds) lower than threshold – Therefore, it focuses on the combinations of terms needed (wAND) • BlockMaxWand – SOTA variant of WAND that uses benefits from the block- layout of posting lists 51 [1] H Turtle & J Flood. Query Evaluation : Strategies and Optimisations. IPM: 31(6). 1995. [2] A Broder et al. Efficient Query Evaluation using a Two-Level Retrieval Process. CIKM 2003.
  52. 52. Early Termination A document evaluation is early terminated if all or some of its postings, relative to the terms of the query, are not fetched from the inverted index or not scored by the ranking function. 52SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  53. 53. Term and document upper bounds • Similarity function between queries and documents: • For each term t in the vocabulary, we can compute a term upper bound (also known as max score) σt(q) such that, for all documents d in the posting list of term t: • For a query q and a document d, we can compute a document upper bound σd(q) by summing up the query term upper bounds and/or actual scores: 53SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  54. 54. Heap and Threshold • During query processing, the top k full or partial scores computed so far, together with the corresponding docids, are organized in a priority queue, or min- heap, • The smallest value of these (partial) scores is called threshold θ. • If there are not at least k scores, the threshold value is assumed to be 0. 54SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA d1 dK d2 dK-1 dK-2
  55. 55. Dynamic Pruning Condition • Property I: The threshold value is not decreasing • Property II: A document with a score smaller than the threshold will never be in final top K documents • Pruning Condition: for a query q and a document d, if the document upper bound σd(q), computed by using partial scores, if any, and term upper bounds, is lesser then or equal to the current threshold θ, the document processing can be early terminated. 55SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  56. 56. Example of early termination 56SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA = 11.1 – 2.1 + 0.5
  57. 57. Effectiveness Guarantees • safe/unoptimized: all documents, not just the top k, appear in the same order and with the same score as they would appear in the ranking produced by a unoptimized strategy. • safe up to k/rank safe: the top k documents produced are ranked correctly, but the document scores are not guaranteed to coincide with the scores produced by a unoptimized strategy. • unordered safe up to k/set safe: the documents coincides with the top k documents computed by a full strategy, but their ranking can be different. • approximate/unsafe: no provable guarantees on the correctness of any portion of the ranking produced by these optimizations can be given 57SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA [Turtle and Flood, 1995] [Strohman, 2007] LessEffectiveMoreEffective
  58. 58. TAAT Optimizations • All TAAT dynamic pruning optimizations split the query processing into two distinct phases. • OR mode – the normal TAAT algorithm is executed – New accumulators are created and updated, until a certain pruning condition is met • AND mode – no new accumulators are created, – a different algorithm is executed, on the remaining terms and/or on the accumulators created during the first phase. 58SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  59. 59. AND Mode • Quit. – The processing of postings completely stops at the end of the first phase. – No new accumulators are created and no postings from the remaining term’s posting lists are processed. • Continue. – The creation of new accumulators stops at the end of the first. – The remaining term’s posting lists will be processed, but just to update the score of the already existing postings. • Decrease. – The processing of postings in the second phase proceeds as in Continue. – The number of accumulators is decreased as the remaining terms are processed. – Those terms will have small score contributions, with few chances to alter the current top k document ranking. – More importantly, the memory occupancy could be reduced as soon as we realize that an existing accumulator can be dropped since it will never enter in the final top k documents, with a corresponding benefit on response times. 59SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA [Moffat and Zobel, 1996] [Anh and Moffat, 1998]
  60. 60. Summary 60SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  61. 61. DAAT Optimizations • We will cover three families of DAAT Optimizations 1. Top Docs: – Adds record of "top" documents to the start of each posting list 2. Max Scores: – Utilises thresholds and upper bounds – Can skip over unscored postings – E.g. MaxScore, WAND 3. Block Max: – Leverages the block layout of the posting lists – Can skip entire blocks of postings – E.g. BlockMaxWand 61SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  62. 62. Top Docs Optimizations • [Brown, 1995] proposed to focus processing on documents with high score contributions. • For any given term t in the vocabulary, a top candidates list (also known as a champions list) is stored separately as a new posting list. • Then, the normal DAAT processing for any query q is applied to the top candidate posting lists only, reducing sensibly the number of postings scored, but with potential negative impacts on the effectiveness, since this optimization is unsafe. 62SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  63. 63. Max Scores Optimizations • MaxScore strategy [Turtle and Flood, 1995] – Early termination: does not compute scores for documents that won’t be returned – By comparing upper bounds with threshold – Based on essential and non-essential posting lists – Suitable for TAAT as well • WAND strategy [Broder et al., 2003] – Approximate evaluation: does not consider documents with approximate scores (sum of upper bounds) lower than threshold • Both use docids sorted posting lists • Both exploit skipping 63SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  64. 64. MaxScore Example (I) 64SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA no document can be returned as a top k results if it appears in the non-essential lists only, i.e., 1.3 + 3.4 < 6.0 Sorted in increasing order of term upper bound
  65. 65. MaxScore Example (II) 65SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Essential lists are processed in DAAT, and their score contributions computed
  66. 66. MaxScore Example (III) 66SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA process the non essential lists by skipping to the candidate docid. Partial score is updated. As soon as a partial score implies early termination, Move to next document.
  67. 67. MaxScore Example (IV) 67SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  68. 68. WAND WAND uses approximate evaluation: does not consider documents with approximate scores (sum of upper bounds) lower than threshold Weak AND or Weighted AND operator: is true if and only if where • Xi is a boolean variable, • wi is a weight, • xi is 0 or 1 depending on Xi • θ is a threshold 69SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  69. 69. WAND Details • Given a query q = {t1,...,tn} and a document d, • Xi is true if and only if the term ti appears in document d • We take the term upper bound σi(d) as weight wi • The threshold θ is the smallest score of the top k documents scored so far during query processing 70SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA is true iff
  70. 70. WAND Example (I) 71SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Posting lists ALWAYS sorted by current docid Pivot docid is the first docid with a non-zero chance to end up in the top k results
  71. 71. WAND Example (II) 72SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Move previous posting lists up to the pivot (or right after) using p.next(d) operator
  72. 72. WAND Example (III) 73SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Keep the posting lists sorted by increasing docid! 2.1 + 1.3 + 4.0 > 6.0 then full evaluation
  73. 73. WAND Example (IV) 74SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Advance the iterators by one and keep lists sorted
  74. 74. Performance 76SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Latency results (in ms) for DAAT, MaxScore and WAND (with speedups), for K = 30. Adapted from [Fontoura et al., 2011]. Latency results (in ms) for DAAT, MaxScore and WAND (with speedups) for different query lengths, average query times on ClueWeb09, for K = 10. Adapted from [Mallia et al., 2017].
  75. 75. Unsafe WAND 77SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA • Reducing K makes (BlockMax)WAND and MaxScore faster – But you get a shorter, but safe, ranking • The dynamic pruning techniques can be also configured to be unsafe – This is achieved by multiplying the threshold Θ by a constant factor F – Increased speed, but some documents will not be retrieved
  76. 76. Global vs Local Term Upper Bounds • A (global) term upper bound is computed over the scores of all documents in a posting list • Each posting list is sequentially divided in a block list, where each block containts a given number of consecutive postings, e.g., 128 postings per block. • For each block, a block (local) term upper bound is computed, storing the maximum contribution of the postings in the block. 78SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  77. 77. Block Max Indexes 79SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA BlockMaxWAND requires further information in the skiplists – i.e. Block Max Indices lexicon
  78. 78. Simplified Block Max WAND pseudocode 80SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA while true pivot ← find pivot as in WAND (all lists) or break perform a shallow move // advance block list iterators to pivot compute block upper bound if pivot would enter heap compute score and update heap perform a deep move // advance posting list iterators to next docid reorder by docid else perform a shallow move // advance block list iterators to next block(s) perform a deep move // advance posting list iterators to next block(s) reorder by docid return top K results in heap
  79. 79. Performance 81SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Latency results (in ms) for DAAT, WAND and BMW (64 postings blocks) (with speedups) for different query lengths, average query times (Avg, in ms) on Gov2, for K = 10.
  80. 80. Variable Block Max WAND SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Blocks size matters: • too large → inaccurate estimation • too small → average skip is small Is fixed-sized blocks the right choice?
  81. 81. ALTERNATIVE INDEX LAYOUTS 89SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  82. 82. Frequency-sorted Indexes • An alternative way to arrange the posting lists is to sort them such that the highest scoring documents appear early. • If an algorithm could find them at the beginning of posting lists, dynamic pruning conditions can be enforced very early. • [Wong and Lee, 1993] proposed to sort posting list in decreasing order of term-document frequency value. 90SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  83. 83. Impact-sorted Indexes • [Anh et al., 2001] introduced the definition of impact of term t in document d for the quantity wt(d)/Wd (or document impact) and impact of term t in query q for wt(q) (or query impact). • They proposed to facilitate effective query pruning by sorting the posting lists in decreasing order of (document) impact, i.e., to process queries on a impact- sorted index 91SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  84. 84. Quantized Impacts • Transform impact values from real numbers in [L,U] to b- bit integers, i.e. 2b buckets. • LeftGeom Quantization: • Uniform Quantization • Reverse Mapping 92SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  85. 85. Quantization Bits • [Anh et al., 2001] show that 5 bits are reasonable to encode the impacts with minimal effectiveness losses for both quantization schemes – with 10 bits, no effectiveness losses are reported – This value should be tuned when using different collections and/or different similarity functions. • Crane et al. [2013] confirmed that 5 to 8 bits are enough for small-medium document collections – for larger collections, from 8 up to 25 bits are necessary, depending on the effectiveness measure. 93SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  86. 86. Compressing impacts • [Trotman, 2014b] investigated the performance of integer compression algorithms for frequency-sorted and impact-sorted indexes. • The new SIMD compressor (QMX) is more time and space efficient than the existing SIMD codecs, as furtherly examined by [Trotman and Lin, 2016]. • [Lin and Trotman, 2017] found that the best performance in query processing with SAAT is indeed obtained when no compression is used, even if the advantage w.r.t. QMX is small (∼5%). • Uncompressed impact-sorted indexes can be up to two times larger than their QMX- compressed versions. 94SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  87. 87. Impact-based Query Processing 95SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  88. 88. Score-at-a-time 96SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  89. 89. Dynamic Pruning for SAAT • The impact blocks are initially processed in OR-mode, then AND- mode, and finally in REFINE-mode. • After the OR-mode, a fidelity control knob controls the percentage of remaining postings to process thereafter (AND & REFINE). – 30% of the remaining postings results in good retrieval performance. • Experiments by [Ahn & Moffat, 2006] showed that this optimization is able to reduce the memory footprint by 98% w.r.t. SAAT, with a speedup of 1.75×. 97SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  90. 90. Anytime Ranking • [Lin and Trotman, 2015] proposed a linear regression model to translate a deadline time on the query processing time into the number of posting to process. • [Mackenzie et al., 2017] proposed to stop after processing a given percentage of the total postings in the query terms' posting lists, on a per-query basis. • According to their experiments, the fixed threshold may result in reduced effectiveness, as the number of query terms increases, but conversely it gives a very strict control over the tail latency of the queries. 98SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  91. 91. Essential Bibliography 99 [Wong and Lee, 1993]. Implementations of partial document ranking using inverted files. IPM, 29 (5):647–669, 1993. [Persin, 1994]. Document filtering for fast ranking. In Proc. SIGIR, 1994. ACM. [Turtle and Flood, 1995] Query evaluation: strategies and optimizations. IPM, 31(6):831–850, 1995. [Anh et al., 2001] Vector-space ranking with effective early termination. In Proc. SIGIR, pages 35–42, 2001. ACM. [Anh and Moffat, 2002]. Impact transformation: effective and efficient web retrieval. In Proc. SIGIR, pages 3–10, 2002. ACM. [Broder et al., 2003] Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426–434, 2003. ACM. [Anh and Moffat, 2006] Pruned query evaluation using pre-computed impacts. In Proc. SIGIR, pages 372–379, 2006. ACM. [Zobel and Moffat, 2006] Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006. [Strohman, 2007] Efficient Processing of Complex Features for Information Retrieval. PhD Thesis, 2007. [Fontoura et al., 2011] Evaluation strategies for top-k queries over memory-resident inverted indexes. Proc. of VLDB Endowment, 4(12): 1213– 1224, 2011. [Crane et al., 2013] Maintaining discriminatory power in quantized indexes. In Proc. CIKM, pages 1221–1224,2013. ACM. [Lin and Trotman, 2015] Anytime ranking for impact-ordered indexes. In Proc. ICTIR, pages 301–304, 2015. ACM. [Trotman and Lin, 2016] In vacuo and in situ evaluation of simd codecs. In Proc. ADCS, pages 1–8, 2016. ACM. [Mackenzie et al., 2017] Early termination heuristics for score-at-a-time index traversal. In Proc, ADCS, pages 8:1– 8:8, 2017. ACM. [Lin and Trotman, 2017] The role of index compression in score-at-a-time query evaluation. Information Retrieval Journal, pages 1–22, 2017. [Mallia et al., 2017] Faster blockmax wand with variable-sized blocks. In Proc. SIGIR, pages 625–634, 2017. ACM. SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  92. 92. PART 3: LEARNING TO RANK 100SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  93. 93. 1,000,000,000s of documents 1000s of documents Roadmap 101 20 docs LEARNING TO RANK e.g. BM25 [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. e.g. AND/ORPart 1: Index layouts Part 2: Query evaluation & dynamic pruning Part 3: Learning-to-rank Part 4: Query Efficiency Prediction & Applications
  94. 94. Motivations for Learning • How to choose term weighting models? • Different term weighting models have different assumptions about how relevant documents should be retrieved • Also: • Field-based models: term occurrences in different fields matter differently • Proximity-models: close co-occurrences matter more • (Deep learned matching models) • Priors: documents with particular lengths or URL/inlink values matter more • Query Features: Long queries, difficult queries, query type How to combine all these easily and appropriately?
  95. 95. Learning to Rank • Application of tailored machine learning techniques to automatically (select and) weight retrieval features • Based on training data with relevance assessments • Learning to rank has been popularised by commercial search engines (e.g. Bing, Baidu, Yandex) • They require large training datasets, possibly instantiated from click-through data • Click-through data has facilitated the deployment of learning approaches T.-Y. Liu. (2009). Learning to rank for information retrieval. Foundation and Trends in Information Retrieval, 3(3), 225–331.
  96. 96. Types of Features • Typically, commercial search engines use hundreds of features for ranking documents, usually categorised as follows: Name Varies depending on… Examples Query Document Query Dependent Features ✔ ✔ Weighting models, e.g. BM25, PL2 Proximity models, e.g. Markov Random Fields Field-based weighting models, e.g. PL2F Deep learned matched representations Query Independent Features ✗ ✔ PageRank, number of inlinks Spamminess Query Features ✔ ✗ Query length Presence of entities
  97. 97. Learning to Rank 1. Sample Identification • Apply BM25 or similar (e.g. DFR PL2) to rank documents with respect to the query • Hope that the sample contains enough relevant documents 2. Compute more features • Query Dependent • Query Independent • Query Features 3A. Learn ranking model • Based on training data 3B. Apply learned model − Re-rank sample documents
  98. 98. Schematically… Generated using same previous two steps, + relevance assessments N. Tonellotto, C. Macdonald, I. Ounis. (2013). Efficient and effective retrieval using selective pruning. WSDM'13. Sample Ranking Learning to Rank technique
  99. 99. Query-Dependent Feature Extraction • We might typically deploy a number of query dependent features – Such as additional weighting models, e.g. fields/proximity, that are calculated based on information in the inverted index • We want the result set passed to the final cascade to have all query dependent features computed – Once first cascade retrieval has ended, it’s too late to compute features without re-traversing the inverted index postings lists – It would take too long to compute all features for all scored documents • Solutions: – 1. cache the "fat" postings for documents that might make the sample (Postings contain frequencies, positions, fields, etc.) – 2. access the document vectors after retrieval 107
  100. 100. d30 4.3 <p1>,<p2> d19 3.7 <p1> d42 3.3 <p1,p2> Caching the Fat • Solution 1: cache the "fat" postings for documents that might make the sample Q Initial Sample Retrieval d30 4.3 d19 3.7 d42 3.3 Calculate Features; Apply LTR d19 1.0 d42 0.9 d30 0.5 Sample ResultSet “Fat” Final ResultSet Inv C Macdonald et al. (2012). About Learning Models with Multiple Query Dependent Features. TOIS 31(3). 108 (With positions)  Lookup cost is almost free – we had to access the inverted index anyway ✗ We only have access to postings for the original query terms
  101. 101. Accessing Document Vectors • Solution 2: access direct/forward index document vectors Q Initial Sample Retrieval d30 4.3 d19 3.7 d42 3.3 Calculate Features; Apply LTR d19 1.0 d42 0.9 d30 0.5 Sample ResultSet Final ResultSet Inv Asadi & Lin. Document vector representations for feature extraction in multi-stage document ranking. Inf. Retr. 16(6): 747-768 (2013) 109 (Need not have positions)  We can calculate features for query terms not in the original query ✗ Additional lookup for K documents ✗ Direct Index postings contain information for all terms in a document d30 4.3 <p1>,<p2> d19 3.7 <p1> d42 3.3 <p1,p2> Direct
  102. 102. Learning to Rank Process document sample ✓ ✗ ✓ ✓ ✗ ✓ ✓ ✗ ✓ ✓ ✗ ✓ Learner Learned Model Re-rank using learned model unseen query Training examples f1 f2 f3 f4 d1 0.2 0.1 0.5 0.8 d2 0.5 0.7 0.0 0.2 d3 0.3 0.2 0.5 0.0
  103. 103. The Importance of Sample Size • What sample size is necessary for effective learning? • Small samples: faster learning, faster retrieval • Large samples: higher recall Small sample size, degraded performance No benefit for sample size larger than 1000 Pairwise techniques struggle with large samples NDCG@20 for different sample sizes Increased performance as sample size rises C. Macdonald, R. Santos and I. Ounis. (2012). The Whens and Hows of Learning to Rank. IR Journal. DOI: 1007/s10791-012-9209-9
  104. 104. TYPES OF MODELS FOR LTR
  105. 105. Types of Learned Models (1) Linear Model • Linear Models (the most intuitive to comprehend) – Many learning to rank techniques generate a linear combination of feature values: – Linear Models make some assumptions: • Feature Usage: They assume that the same features are needed by all queries • Model Form: The model is only a linear combination of feature values. - Contrast this with genetic algorithms, which can learn functional combinations of features, by randomly introducing operators (e.g. try divide feature a by feature b), but are unpractical to learn • It is difficult to find wf values that maximise the performance of an IR evaluation metric, as they are none- smooth and none-differentiable – Typically, techniques such as simulated annealing or stochastic gradient descent are used to empirically obtain wf values score(d,Q) = wf f å ×valuef (d)
  106. 106. Type of Learned Models (2) Regression Trees – A regression tree is series of decisions, leading to a partial score output – The outcome of the learner is a “forest” of many such trees, used to calculate the final score of a document for a query – Their ability to customise branches makes them more effective than linear models – Regression trees are pointwise in nature, but several major search engines have created adapted regression tree techniques that are listwise – E.g. Microsoft’s LambdaMART is at the heart of the Bing search engine (hundreds/thousands of them!)Σ For each document in the sample: S. Tyree, K. Weinberger, K. Agrawal, J. Paykin. (2011). Parallel Boosted Regression Trees for Web Search Ranking. WWW’11.
  107. 107. Learning to Rank Best Practices (Query Features) • Tree-based learning to rank techniques have an inherent advantage • They can activate different parts of their learned model depending on feature values • We can add “query features”, which allow different sub-trees for queries with different characteristics – E.g. long query, apply proximity Class of Query Features NDCG@20 Baseline (47 document features) LambdaMART 0.2832 Query Performance Predictors 0.3033* Query Topic Classification (entities) 0.3109* Query Concept Identification (concepts) 0.3049* Query Log Mining (query frequency) 0.3085* C. Macdonald, R. Santos and I. Ounis. (2012). On the Usefulness of Query Features for Learning to Rank. CIKM’12.
  108. 108. Learning to Rank Best Practices • About the sample – it must be: – Small enough that calculating features doesn’t take too long – Small enough that BMW/WAND/MaxScore are fast! – Large enough to have enough sufficient recall of relevant douments • Previous slide says 1000 documents needed for TREC Web corpora – Simple early cascade: Only time for a single weighting model to create the sample • E.g. PL2 < BM25 with no features; but when features are added, no real difference • Multiple weighting models as features: • Using more weighting models do improve the learned model • Field Models as Features • Adding field-based models as features always improves effectiveness • In general, Query Independent features (e.g. PageRank, Inlinks, URL length) improve effectiveness C. Macdonald, R. Santos, I. Ounis, B. He. About Learning Models with Multiple Query Dependent Features. TOIS 31(3), 2013. C Macdonald & I Ounis (2012). The Whens & Hows of Learning to Rank. INRT. 16(5), 2012.
  109. 109. EFFICIENTLY APPLYING TREE MODELS
  110. 110. Additive Ensmbles of Regression Trees • Tree forests: Gradient-Boosted Regression Tress, Lambda MART, Random Forest, etc. – ensemble of weak learners, each contributing a partial score – at scoring time, all trees can be processed independently 126SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA T1 (q, d) s1 T2 (q, d) s2 Tn (q, d) sn Assuming: • 3,000 trees • an average tree depth of 10 nodes • 1,000 documents scored per query • a cluster with 1,000 search shards Costs • 3,000 x 10 x 1,000 = 30 M tests per query and per shard • Approx. 30 G tests for the entire search cluster
  111. 111. 127 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 Query-Document pair features F1 F2 F3 F4 F5 F6 F7 F8 0.4 -1.4 1.5 3.2 2.0 0.5 -3.1 7.1 50.1:F4 10.1:F1 -3.0:F3 -1.0:F3 3.0:F8 0.1:F6 0.2:F2 2.0 s(d) += 2.0 exit leaf true false - number of trees = 1,000 – 20,000 - number of leaves = 4 – 64 - number of docs = 3,000 – 10,000 - number of features = 100 –1,000 Processing a query-document pair d[F4] ≤ 50.1 SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 127
  112. 112. 128 Struct+ • Each tree node is represented by a C object containing: • the feature id, • the associated threshold • the left and right pointers • Memory is allocated for all the nodes at once • The nodes are laid out using a breadth-first traversal of the tree Node* getLeaf(Node* np, float* fv) { if (!np->left && !np->right) { return np; } if (fv[np->fid] <= np->threshold) { return getLeaf(np->left, fv); } else { return getLeaf(np->right, fv); } } Tree traversal: - No guarantees about references locality - Low cache hit ratio - High branch misprediction rate - Processor pipeline flushing SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 128
  113. 113. 129 CodeGen • Statically-generated if-then-else blocks if (x[4] <= 50.1) { // recurses on the left subtree ... } else { // recurses on the right subtree if(x[3] <= -3.0) { result = 0.4; } else { result = -1.4; } } Tree traversal: - Static generation - Recompilation if model changes - Compiler "black magic" optimizations - Expected to be fast - Compact code, expected to fit in I-cache - High cache hit ratio - No data dependencies, a lot of control dependencies - High branch misprediction rate SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 129
  114. 114. 130 VPred • Array representation of nodes, similar to the Struct+ baseline • All trees in a single array, need additional data structures to locate a tree in the array, and to know its height Tree traversal: Theory Practice double depth4(float* x, Node* nodes) { int nodeId = 0; nodeId = nodes->children[ x[nodes[nodeId].fid] > nodes[nodeId].theta]; nodeId = nodes->children[ x[nodes[nodeId].fid] > nodes[nodeId].theta]; nodeId = nodes->children[ x[nodes[nodeId].fid] > nodes[nodeId].theta]; nodeId = nodes->children[ x[nodes[nodeId].fid] > nodes[nodeId].theta]; return scores[nodeId]; } - Static unrolling of traversal functions - High compilation times - Compiler "black magic" optimizations - Expected to be fast - In-code vectorization - Processing 16 documents at a time - Requires tuning on dataset - No control dependencies, a lot of data dependencies - Low branch misprediction rate SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 130
  115. 115. 131 Vectorization of Trees offsets |F| T1 (q d) s1 T2 (q, d) s2 Tn (q, d) sn Thresholds of tests nodes involving feature f0 Thresholds of tests nodes involving feature f1 thresholds SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 131
  116. 116. 132 while !(x[f0] <= thresholds[i]) do // bitmask operations offsets thresholds • Linear access to data • Highly predictable conditional branches • True nodes are not inspected at all SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 132 Processing Tress in Vectors
  117. 117. 133 Quickscorer algorithm SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 133 Tree traversal (BitMask Operations) Score computation
  118. 118. 134 Table 2: Per -docum ent scor ing t im e in µs of QS, VPr ed, If -T hen-El se and St r uct + on MSN-1 and Y!S1 dat aset s. G ain fact or s ar e r epor t ed in par ent heses. M et hod ⇤ Number of t rees/ dat aset 1, 000 5, 000 10, 000 20, 000 MSN-1 Y!S1 MSN-1 Y!S1 MSN-1 Y!S1 MSN-1 Y!S1 QS 8 2.2 (–) 4.3 (–) 10.5 (–) 14.3 (–) 20.0 (–) 25.4 (–) 40.5 (–) 48.1 (–) VPr ed 7.9 (3.6x) 8.5 (2.0x) 40.2 (3.8x) 41.6 (2.9x) 80.5 (4.0x) 82.7 (3.3) 161.4 (4.0x) 164.8 (3.4x) If -T hen-El se 8.2 (3.7x) 10.3 (2.4x) 81.0 (7.7x) 85.8 (6.0x) 185.1 (9.3x) 185.8 (7.3x) 709.0 (17.5x) 772.2 (16.0x) St r uct + 21.2 (9.6x) 23.1 (5.4x) 107.7 (10.3x) 112.6 (7.9x) 373.7 (18.7x) 390.8 (15.4x) 1150.4 (28.4x) 1141.6 (23.7x) QS 16 2.9 (–) 6.1 (–) 16.2 (–) 22.2 (–) 32.4 (–) 41.2 (–) 67.8 (–) 81.0 (–) VPr ed 16.0 (5.5x) 16.5 (2.7x) 82.4 (5.0x) 82.8 (3.7x) 165.5 (5.1x) 165.2 (4.0x) 336.4 (4.9x) 336.1 (4.1x) If -T hen-El se 18.0 (6.2x) 21.8 (3.6x) 126.9 (7.8x) 130.0 (5.8x) 617.8 (19.0x) 406.6 (9.9x) 1767.3 (26.0x) 1711.4 (21.1x) St r uct + 42.6 (14.7x) 41.0 (6.7x) 424.3 (26.2x) 403.9 (18.2x) 1218.6 (37.6x) 1191.3 (28.9x) 2590.8 (38.2x) 2621.2 (32.4x) QS 32 5.2 (–) 9.7 (–) 27.1 (–) 34.3 (–) 59.6 (–) 70.3 (–) 155.8 (–) 160.1 (–) VPr ed 31.9 (6.1x) 31.6 (3.2x) 165.2 (6.0x) 162.2 (4.7x) 343.4 (5.7x) 336.6 (4.8x) 711.9 (4.5x) 694.8 (4.3x) If -T hen-El se 34.5 (6.6x) 36.2 (3.7x) 300.9 (11.1x) 277.7 (8.0x) 1396.8 (23.4x) 1389.8 (19.8x) 3179.4 (20.4x) 3105.2 (19.4x) St r uct + 69.1 (13.3x) 67.4 (6.9x) 928.6 (34.2x) 834.6 (24.3x) 1806.7 (30.3x) 1774.3 (25.2x) 4610.8 (29.6x) 4332.3 (27.0x) QS 64 9.5 (–) 15.1 (–) 56.3 (–) 66.9 (–) 157.5 (–) 159.4 (–) 425.1 (–) 343.7 (–) VPr ed 62.2 (6.5x) 57.6 (3.8x) 355.2 (6.3x) 334.9 (5.0x) 734.4 (4.7x) 706.8 (4.4x) 1309.7 (3.0x) 1420.7 (4.1x) If -T hen-El se 55.9 (5.9x) 55.1 (3.6x) 933.1 (16.6x) 935.3 (14.0x) 2496.5 (15.9x) 2428.6 (15.2x) 4662.0 (11.0x) 4809.6 (14.0x) St r uct + 109.8 (11.6x) 116.8 (7.7x) 1661.7 (29.5x) 1554.6 (23.2x) 3040.7 (19.3x) 2937.3 (18.4x) 5437.0 (12.8x) 5456.4 (15.9x) same trivially holds for St r uct + . This means that the in- terleaved traversal strategy of QS needs to processlessnodes than in a traditional root-to-leaf visit. This mostly explains the results achieved by QS. As far as number of branches is concerned, we note that, not surprisingly, QS and VPr ed are much more efficient than If -T hen-El se and St r uct + with this respect. QS number of L3 cache misses starts increasing when dealing with 9, 000 trees on MSN and 8, 000 trees on Y!S1. BWQS: a block-wise variant of QS The previous experiments suggest that improving the cache efficiency of QS may result in significant benefits. As in Performance Per-document scoring time in microsecs and speedups SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 134
  119. 119. Essential Bibliography 135 • Craig Macdonald, Rodrygo Santos and Iadh Ounis. The Whens and Hows of Learning to Rank. Information Retrieval 16(5):584-628. • Craig Macdonald, Rodrygo Santos and Iadh Ounis. On the Usefulness of Query Features for Learning to Rank. In Proceedings of CIKM 2012. • C Macdonald et al, About Learning Models with Multiple Query Dependent Features. (2012). TOIS 31(3). • N Fuhr. Optimal Polynomial Retrieval Functions Based on the Probability Ranking Principle. (1989). TOIS. 7(3), 1989. • S. Tyree, K.Q. Weinberger, K. Agrawal, J. Paykin Parallel boosted regression trees for web search ranking. WWW 2011. • N. Asadi, J. Lin, and A. P. de Vries. Runtime optimizations for tree-based machine learning models. IEEE Transactions on Knowledge and Data Engineering, 26(9):2281–2292, 2014. • C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Quickscorer: A fast algorithm to rank documents with additive ensembles of regression trees. In Proc. of the 38th ACM SIGIR Conference, pages 73–82, 2015. • D. Dato, C. Lucchese, F.M. Nardini, S. Orlando, R. Perego, N. Tonellotto, R. Venturini. Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Transactions on Information Systems, 2016. • C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles. In Proc. of the 39th ACM SIGIR Conference, pages 833-836, 2016. • F. Lettich, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. GPU-based Parallelization of QuickScorer to Speed-up Document Ranking with Tree Ensembles. In Proc. of the 7th IIR Workshop, 2016 SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  120. 120. QUERY EFFICIENCY PREDICTION
  121. 121. 1,000,000,000s of documents 1000s of documents Roadmap 137 20 docs LEARNING TO RANK e.g. BM25 [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. e.g. AND/ORPart 1: Index layouts Part 2: Query evaluation & dynamic pruning Part 3: Learning-to-rank Part 4: Query Efficiency Prediction & Applications
  122. 122. If we know how long a query will take, can we reconfigure the search engine's cascades? 1,000,000,000s of documents 1000s of documents 10 docs Simple Ranking: Identify a set most likely to contain relevant documents Re-Ranking: Try really hard to get the top of the ranking correct, using many signals (features) [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. Boolean: Do query terms occur? 138
  123. 123. Main Idea: Query Efficiency Prediction • Predict how long an unseen query will take to execute, before it has executed. • This facilitates 3+ manners to make a search engine more efficient: 1. Reconfigure the cascades of the search engine, trading off a little effectiveness for efficiency 2. Apply more CPU cores to long-running queries 3. Decide how to plan the rewrites of a query, to reduce long- running queries In each case: increasing efficiency means increased server capacity and energy savings
  124. 124. 2 term queries 4 term queries What makes a query fast or slow? 141 Length of Posting Lists? Full/MaxScore/ Wand? Length of Query? Co-occurrence of query terms?
  125. 125. Query Efficiency Prediction • Can we predict response time for a query (query efficiency prediction)? • Properties: – Can be calculated online before retrieval commences – Easy (quick) to calculate – Based on single term statistics, previously calculated offline – Includes statistics on posting list’s score distribution term-level statistic predictor feature aggregation function
  126. 126. • We combine statistics of the query terms using in a regression to accurately predict execution time for a query? Query Efficiency Prediction Framework total postings (baseline) 42 inexpensive featur
  127. 127. Accuracy of Query Efficiency Predictors 144 Findings: Our learned predictor, with 42 features can accurately predict the response time of the search engine for an unseen query
  128. 128. Three Applications of Query Efficiency Prediction 1. Selective retrieval – can we reduce the time taken in the earlier retrieval cascades, e.g. by retrieving less documents 2. Selective parallelisation – apply more CPU cores for long-running queries 3. Selective query rewriting – can we decide how to rewrite a query on a per-query basis? • Recall that, for each application: increasing efficiency means increased server capacity and energy savings
  129. 129. QEP APPLICATION (1): SELECTIVE RETRIEVAL 146
  130. 130. QEP Application (1): Selective Retrieval • If we only need to return 10 documents to the user, do we need to re-score 1000s? • i.e. can we configure the earlier cascades to be faster for longer queries, but potentially miss relevant documents? Boolean: Do query terms occur? Simple Ranking: Identify a set most likely to contain relevant documents 1,000,000,000s of documents 1000s of documents 20 docs 500,000,000 docs [Tonellotto, Macdonald, Ounis, WSDM 2013] 500s docs 10 docs 1,000,000,000s of documents 1000s of documents 10 docs Re-Ranking: Try really hard to get the top of the ranking correct, using many signals (features) Proposed Method [WSDM 2013]: • Predict the efficiency (response time) of a query • Adjust <K, F> of WAND to be faster for predicted longer queries
  131. 131. Selective Pruning Results 148SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Findings: by using our 42-feature query efficiency predictor to selectively adjust the retrieval cascades, we could reduce response time by 31%, while maintaining high effectiveness The increased capacity by the better efficiency is equivalent to 44% more servers
  132. 132. QEP APPLICATION (2) : SELECTIVE PARALLELISATION 152
  133. 133. QEP Application (2): Selective Parallelisation • Industry web search deployments have to meet service level agreements – e.g. 200ms for 99th-percentile response time – Only 1 query in 100 may take longer than 200ms • Some queries can take > 200ms Problem: How to identify and speed-up those slow queries?
  134. 134. Parallelisation: Retrieval with multiple cores • Parallelisation using N threads speeds up all queries, but decreases server capacity to 1/N • Idea! Selective Parallelisation – only parallelise those queries predicted to be too long 154 CPU 1 CPU 2 Selective Prediction: long! cup 1.3B p Lexicon person 500M p world 6.9B p cup world InvertedIndex 1 2 2 5 2 6 3 5 6 7 6 8 1 2 3 4 3 5 3 6 ---- ---- world cup ----
  135. 135. As used by !! • Jeon et al. reports that this approach reduces the 99th-percentile response time by 50% from 200ms to 100ms, compared to other state-of-the- art approaches that do not consider predicted execution time • In doing so, it increases server capacity by more than 50%. This potentially saves one-third of production servers, constituting a significant cost reduction • Selective parallelisation, using query efficiency predictors, is now being deployed by industrial search engine Bing, across hundreds of thousands of servers Image Source: Wikimedia [Predictive parallelization: Taming tail latencies in web search. M Jeon, S Kim, S Hwang, Y He, S Elnikety, AL Cox, S Rixner. Proceedings of ACM SIGIR 2014]
  136. 136. QEP APPLICATION (3) : SELECTIVE QUERY REWRITING 156
  137. 137. Query Rewriting Plans • Remember our notion of a query plan, incl. rewriting the query – E.g. Context-Sensitive Stemming [1] or MRF Proximity: [2] 157 • All these rewritings use complex operators, e.g. #syn() #uwN() – #syn: documents containing any of the words – #1: documents containing the exact phrase – #uwN: documents containing the words in a window of size N • Key Question: how to rewrite a query, to maximise efficiency AND effectiveness 157 [1] F. Peng et al. Context Sensitive Stemming for Web Search. SIGIR 2007. [2] D. Metzler, W.B. Croft. A Markov Random Field Model for Term Dependencies. SIGIR 2005.
  138. 138. 158 Different queries can be rewritten in different ways Different queries (and rewritings) can take varying durations to execute We describe a framework that plans how to execute each query (e.g. rewriting), taking into account: How long each plan may take to execute Expected effectiveness of each plan
  139. 139. A Selective Rewriting Approach • The selective mechanism is based around two ideas: 1. We can predict the efficiency of a given query plan… and hence eliminate plans that will exceed the target response time 2. We can predict the effectiveness of a given query plan… and where possible execute the most effective 159 QEP, including for complex queries QPP, or an expectation of plan effectiveness
  140. 140. Complex terms and ephemeral posting lists • Recall, QEP essentially estimates pruning, based on posting list lengths • Challenge – obtaining posting list length estimates for queries involving complex terms, e.g. 160 car 21 2 25 3 26 5 cars 19 1 25 2 #syn(car,cars) 19 1 21 2 25 5 26 6 unigram ephemeral
  141. 141. Observations on Ephemeral Postings lists • Observation: Statistics for ephemeral postings lists can be bounded… – e.g. #syn is disjunctive, so the length of its posting list cannot be greater than the sum of the posting list lengths of its constituent terms – e.g. #1 is conjunctive, the length of its postings list cannot be greater than the minimum of the posting list lengths of its constituent terms – e.g. for #1, the maximum number of matches for a phrase cannot exceed the minimum of the max term frequencies of the constituent terms • These allow: a) upper bound statistics (necessary for doing dynamic pruning with complex terms) b) Efficiency predictions for queries with complex terms 161
  142. 142. Putting together: Complex QEP & Online Plan Selection 162 49% decrease in mean response time, and 62% decrease in tail response time, compared to Default, w/o significant loss in NDCG
  143. 143. Looking Forward • The query plans we showed here demonstrate only two possible rewritings: – Proximity and stemming – They don’t consider the (predicted) effectiveness of a plan • We highlight Rosset et al, who have proposed use of reinforcement learning to identify the most appropriate plan per-query and per-index block. 163 Optimizing Query Evaluations using Reinforcement Learning for Web Search. Rosset et al, Proceedings of SIGIR 2018.
  144. 144. QEP Bibliography • Predictive parallelization: Taming tail latencies in web search. M Jeon, S Kim, S Hwang, Y He, S Elnikety, AL Cox, S Rixner. Proceedings of ACM SIGIR 2014 • Load-Sensitive Selective Pruning for Distributed Search. Nicola Tonellotto, Fabrizio Silvestri, Raffaele Perego, Daniele Broccolo, Salvatore Orlando, Craig Macdonald and Iadh Ounis. In Proceedings of CIKM 2013. • Efficient and effective retrieval using selective pruning. Nicola Tonellotto, Craig Macdonald and Iadh Ounis. In Proceedings of WSDM 2013. • Learning to Predict Response Times for Online Query Scheduling. Craig Macdonald, Nicola Tonellotto and Iadh Ounis. In Proceedings of SIGIR 2012. • Efficient & Effective Selective Query Rewriting with Efficiency Predictions. Craig Macdonald, Nicola Tonellotto, Iadh Ounis. In Proceedings of SIGIR 2017. 164ECIR 2017 — 9 April 2017 — Aberdeen, Scotland, UK
  145. 145. CONCLUSIONS 165
  146. 146. Conclusions • Information retrieval requires not just effective, but also efficient techniques – These two form an inherent tradeoff – Efficiency can be helped by applying more horsepower, but at significant cost! • Hence efficiency is important to ensure the Green credentials of the IR community • We have presented many techniques can aim to improve particular methods of retrieving – They make assumptions about the IR process (e.g. we care about top K documents) • As new IR techniques and models are created, there will always be a for enhancements to their efficiency to make them easily deployable 166SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  147. 147. Emerging Trends • BitFunnel (SIGIR 2017 best paper) – a "lossy" signature representation for inverted files – Now deployed by Bing • Similarly deep learning data structures – Google • Deep learning for matching – but efficiency costs not yet well-researched • And your future ideas…? 167SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  148. 148. Acknolwedgements • Some of the slides presented here have appeared in presentations which we have co- authored with: – Iadh Ounis – Richard McCreadie – Matteo Catena – Raffaele Perego – Rossano Venturini – Salvatore Orlando – Franco Maria Nardini – Claudio Lucchese 168SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  149. 149. • Feedback: – http://bit.ly/2ucH86b 169SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA

×