SlideShare une entreprise Scribd logo
1  sur  82
Télécharger pour lire hors ligne
A Billion Queries Per Day
Michael Busch
@michibusch
michael@twitter.com
buschmi@apache.org




                            1
A Billion Queries Per Day

 Agenda

 ‣ Introduction

 - Posting list format and early query termination

 - Index memory model

 - Lock-free algorithms and data structures

 - Roadmap




                                                     2
Introduction




               3
Introduction

• Search @ Twitter today:


      • >1,000 TPS (tweets/sec) = >80,000,000 tweets/day

      • >12,000 QPS (queries/sec) = >1,000,000,000 queries/day

      • <10s latency (entire pipeline)

      • <1s latency (indexer)


• Performance goal: Ability to scale to support Twitter’s continued growth




                                                                             4
Introduction

• 1st gen search engine based on MySQL


• Next-gen engine based on (partially rewritten) Lucene Java


• Significantly more performant (orders of magnitude)


• Much easier to develop new features on the new platform - stay tuned!




                                                                          5
A Billion Queries Per Day

 Agenda

 - Introduction

 ‣ Posting list format and early query termination

 - Index memory model

 - Lock-free algorithms and data structures

 - Roadmap




                                                     6
Posting list format and early
     query termination



                                7
Objectives

• Only evaluate as few documents as possible before terminating the query


• Rank documents in reverse time order (newest documents first)




                                                                            8
Inverted Index 101

 1   The old night keeper keeps the keep in the town

 2   In the big old house in the big old gown.

 3   The house in the town had the big old keep

 4   Where the old night keeper never did sleep.

 5   The night keeper keeps the keep in the night

 6   And keeps in the dark and sleeps in the light.


Table with 6 documents



Example from:
Justin Zobel , Alistair Moffat,
Inverted files for text search engines,
ACM Computing Surveys (CSUR)
v.38 n.2, p.6-es, 2006




                                                       9
Inverted Index 101

1   The old night keeper keeps the keep in the town      term     freq
2   In the big old house in the big old gown.             and       1    <6>
                                                           big      2    <2> <3>
3   The house in the town had the big old keep
                                                         dark       1    <6>
4   Where the old night keeper never did sleep.
                                                           did      1    <4>
5   The night keeper keeps the keep in the night         gown       1    <2>
6   And keeps in the dark and sleeps in the light.        had       1    <3>
                                                        house       2    <2> <3>
Table with 6 documents                                      in      5    <1> <2> <3> <5> <6>
                                                         keep       3    <1> <3> <5>
                                                        keeper      3    <1> <4> <5>
                                                        keeps       3    <1> <5> <6>
                                                          light     1    <6>
                                                         never      1    <4>
                                                         night      3    <1> <4> <5>
                                                           old      4    <1> <2> <3> <4>
                                                         sleep      1    <4>
                                                        sleeps      1    <6>
                                                           the      6    <1> <2> <3> <4> <5> <6>
                                                         town       2    <1> <3>
                                                        where       1    <4>
                                                      Dictionary and posting lists
                                                                                                   10
Inverted Index 101                                                          Query: keeper

1   The old night keeper keeps the keep in the town      term     freq
2   In the big old house in the big old gown.             and       1    <6>
                                                           big      2    <2> <3>
3   The house in the town had the big old keep
                                                         dark       1    <6>
4   Where the old night keeper never did sleep.
                                                           did      1    <4>
5   The night keeper keeps the keep in the night         gown       1    <2>
6   And keeps in the dark and sleeps in the light.        had       1    <3>
                                                        house       2    <2> <3>
Table with 6 documents                                      in      5    <1> <2> <3> <5> <6>
                                                         keep       3    <1> <3> <5>
                                                        keeper      3    <1> <4> <5>
                                                        keeps       3    <1> <5> <6>
                                                          light     1    <6>
                                                         never      1    <4>
                                                         night      3    <1> <4> <5>
                                                           old      4    <1> <2> <3> <4>
                                                         sleep      1    <4>
                                                        sleeps      1    <6>
                                                           the      6    <1> <2> <3> <4> <5> <6>
                                                         town       2    <1> <3>
                                                        where       1    <4>
                                                      Dictionary and posting lists
                                                                                                   11
Inverted Index 101                                                         Query: keeper

1   The old night keeper keeps the keep in the town      term    freq
2   In the big old house in the big old gown.             and      1    <6>
                                                          big      2    <2> <3>
3   The house in the town had the big old keep
                                                         dark      1    <6>
4   Where the old night keeper never did sleep.
                                                          did      1    <4>
5   The night keeper keeps the keep in the night        gown       1    <2>
6   And keeps in the dark and sleeps in the light.        had      1    <3>
                                                        house      2    <2> <3>
Table with 6 documents                                     in      5    <1> <2> <3> <5> <6>
                                                        keep       3    <1> <3> <5>
                                                       keeper      3    <1> <4> <5>
                                                        keeps      3    <1> <5> <6>
                                                         light     1    <6>
                                                        never      1    <4>
                                                        night      3    <1> <4> <5>
                                                          old      4    <1> <2> <3> <4>
                                                        sleep      1    <4>
                                                       sleeps      1    <6>
                                                          the      6    <1> <2> <3> <4> <5> <6>
                                                        town       2    <1> <3>
                                                        where      1    <4>
                                                      Dictionary and posting lists
                                                                                                  12
Early query termination                                                     Query: old

1   The old night keeper keeps the keep in the town      term     freq
2   In the big old house in the big old gown.             and       1    <6>
                                                           big      2    <2> <3>
3   The house in the town had the big old keep
                                                         dark       1    <6>
4   Where the old night keeper never did sleep.
                                                           did      1    <4>
5   The night keeper keeps the keep in the night         gown       1    <2>
6   And keeps in the dark and sleeps in the light.        had       1    <3>
                                                        house       2    <2> <3>
Table with 6 documents                                      in      5    <1> <2> <3> <5> <6>
                                                         keep       3    <1> <3> <5>
                                                        keeper      3    <1> <4> <5>
                                                        keeps       3    <1> <5> <6>
                                                          light     1    <6>
                                                         never      1    <4>
                                                         night      3    <1> <4> <5>
                                                           old      4    <1> <2> <3> <4>
                                                         sleep      1    <4>
                                                        sleeps      1    <6>
                                                           the      6    <1> <2> <3> <4> <5> <6>
                                                         town       2    <1> <3>
                                                        where       1    <4>
                                                      Dictionary and posting lists
                                                                                                   13
Early query termination                                                     Query: old

1   The old night keeper keeps the keep in the town      term     freq
2   In the big old house in the big old gown.             and       1    <6>
                                                           big      2    <2> <3>
3   The house in the town had the big old keep
                                                         dark       1    <6>
4   Where the old night keeper never did sleep.
                                                           did      1    <4>
5   The night keeper keeps the keep in the night         gown       1    <2>
6   And keeps in the dark and sleeps in the light.        had       1    <3>
                                                        house       2    <2> <3>
Table with 6 documents                                      in      5    <1> <2> <3> <5> <6>
                                                         keep       3      Walk posting list in reverse
                                                                         <1> <3> <5>                      order
                                                        keeper      3    <1> <4> <5>
                                                        keeps       3    <1> <5> <6>
                                                          light     1    <6>
                                                         never      1    <4>
                                                         night      3    <1> <4> <5>
                                                           old      4    <1> <2> <3> <4>
                                                         sleep      1    <4>
                                                        sleeps      1    <6>
                                                           the      6    <1> <2> <3> <4> <5> <6>
                                                         town       2    <1> <3>
                                                        where       1    <4>
                                                      Dictionary and posting lists
                                                                                                             14
Early query termination                                                     Query: old

1   The old night keeper keeps the keep in the town      term     freq
2   In the big old house in the big old gown.             and       1    <6>
                                                           big      2    <2> <3>
3   The house in the town had the big old keep
                                                         dark       1    <6>
4   Where the old night keeper never did sleep.
                                                           did      1    <4>
5   The night keeper keeps the keep in the night         gown       1    <2>
6   And keeps in the dark and sleeps in the light.        had       1    <3>
                                                        house       2    <2> <3>
Table with 6 documents                                      in      5    <1> <2> <3>results are requested:
                                                                              E.g. 2 <5> <6>
                                                         keep       3    <1> <3> <5>termination after exactly
                                                                           Optimal
                                                        keeper      3    <1> <4>two postings were read
                                                                                 <5>
                                                        keeps       3    <1> <5> <6>
                                                          light     1    <6>
                                                         never      1    <4>
                                                         night      3    <1> <4> <5>
                                                           old      4    <1> <2> <3> <4>
                                                         sleep      1    <4>
                                                        sleeps      1    <6>
                                                           the      6    <1> <2> <3> <4> <5> <6>
                                                         town       2    <1> <3>
                                                        where       1    <4>
                                                      Dictionary and posting lists
                                                                                                           15
Early query termination

• Possible if time-ordered results are desired (which is often the case in realtime
  search applications)


• Posting list requirements:


   • Pointer from dictionary to last posting in a list (must always be up to date)


   • Possibility to iterate lists in reverse order


   • Allow efficient skipping




                                                                                      16
Posting list encodings

Doc IDs to encode: 5, 15, 9000, 9002
Delta encoding: 5, 10, 8985, 2


                                      Safes space, if appropriate
                                 compression techniques are used, e.g.
                                    variable-length integers (VInt)


• Lucene uses a combination of delta encoding and VInt compression


• VInts are expensive to decode


• Problem 1: How to traverse posting lists backwards?


• Problem 2: How to write a posting atomically, if not a primitive java type

                                                                               17
Postinglist format

int (32 bits)




                      docID                      textPosition
                     24 bits                        8 bits
                    max. 16.7M                     max. 255




• Tweet text can only have 140 chars


• Decoding speed significantly improved compared to delta and VInt decoding
  (early experiments suggest 5x improvement compared to vanilla Lucene with
  FSDirectory)


                                                                              18
Postinglist format

• ints can be written atomically in Java


• Backwards traversal easy on absolute docIDs (not deltas)


• Every posting is a possible entry point for a searcher


• Skipping can be done without additional data structures as binary search,
  even though there are better approaches which should be explored


• On tweet indexes we need about 30% more storage for docIDs compared to
  delta+Vints, but that’s compensated by savings on position encoding!


• Max. segment size: 2^24 = 16.7M tweets


                                                                              19
Objectives

• Only evaluate as few documents as possible before terminating the query


• Rank documents in reverse time order (newest documents first)




                                                                            20
A Billion Queries Per Day

 Agenda

 - Introduction

 - Posting list format and early query termination

 ‣ Index memory model

 - Lock-free algorithms and data structures

 - Roadmap




                                                     21
Index memory model




                     22
Objectives

• Store large amount of linked lists of a priori unknown lengths efficiently in
  memory


• Allow traversal of lists in reverse order (backwards linked)


• Allow efficient garbage collection




                                                                                 23
Inverted Index

1   The old night keeper keeps the keep in the town      term       freq
2   In the big old house in the big old gown.             and         1 <6>
                                                           big        2 <2> <3>
3   The house in the town had the big old keep
                                                         dark        Per <6> we store different
                                                                      1 term
4   Where the old night keeper never did sleep.
                                                           did    kinds<4>metadata: text pointer,
                                                                      1 of
5   The night keeper keeps the keep in the night         gown         1 <2>
                                                                  frequency, postings pointer, etc.
6   And keeps in the dark and sleeps in the light.        had         1 <3>
                                                        house         2 <2> <3>
Table with 6 documents                                      in        5 <1> <2> <3> <5> <6>
                                                         keep         3 <1> <3> <5>
                                                        keeper        3 <1> <4> <5>
                                                        keeps         3 <1> <5> <6>
                                                          light       1 <6>
                                                         never        1 <4>
                                                         night        3 <1> <4> <5>
                                                           old        4 <1> <2> <3> <4>
                                                         sleep        1 <4>
                                                        sleeps        1 <6>
                                                           the        6 <1> <2> <3> <4> <5> <6>
                                                         town         2 <1> <3>
                                                        where         1 <4>
                                                      Dictionary and posting lists
                                                                                                      24
LUCENE-2329: Parallel posting arrays
class PostingList

int textPointer;
int postingsPointer;
int frequency;
...


• Term hashtable is an array of these objects: PostingList[] termsHash


• For each unique term in a segment we need an instance; this results in a very
  large number of objects that are long-living, i.e. the garbage collecter can’t
  remove them quickly (they need to stay in memory until the segment is
  flushed)


• With a searchable RAM buffer we want to flush much less often and allow
  DocumentsWriter to fill up the available memory

                                                                                   25
LUCENE-2329: Parallel posting arrays
class PostingList

int textPointer;
int postingsPointer;
int frequency;
...


• Having a large number of long-living objects is very expensive in Java,
  especially when the default mark-and-sweep garbage collector is used


• The mark phase of GC becomes very expensive, because all long-living
  objects in memory have to be checked


• We need to reduce the number of objects to improve GC performance!
  -> Parallel posting arrays



                                                                            26
LUCENE-2329: Parallel posting arrays

PostingList[]




                  int textPointer;
                  int postingsPointer;
                  int frequency;




                                         27
LUCENE-2329: Parallel posting arrays
termID
int[]      textPointer;   postingsPointer;   frequency;
              int[]            int[]           int[]
            0
            1
            2
            3
            4
            5
            6




                                                          28
LUCENE-2329: Parallel posting arrays
termID
int[]      textPointer;   postingsPointer;   frequency;
              int[]            int[]           int[]
            0   t0               p0             f0
            1
            2
            3
            4
            5
            6
  0




                                                          29
LUCENE-2329: Parallel posting arrays
termID
int[]      textPointer;   postingsPointer;   frequency;
              int[]            int[]           int[]
            0   t0               p0             f0
  1
            1   t1               p1             f1
            2
            3
            4
            5
            6
  0




                                                          30
LUCENE-2329: Parallel posting arrays
termID
int[]       textPointer;         postingsPointer;            frequency;
               int[]                  int[]                    int[]
              0    t0                     p0                       f0
  1
              1    t1                     p1                       f1
              2    t2                     p2                       f2
              3
              4
              5
              6
  0
         • Total number of objects is now greatly reduced and is
           constant and independent of number of unique terms
  2

         • With parallel arrays we safe 28 bytes per unique term
           -> 41% savings compared to PostingList object

                                                                          31
LUCENE-2329: Parallel posting arrays -
Performance

• Performance experiments: Index 1M wikipedia docs


        1) -Xmx2048M, indexWriter.setMaxBufferSizeMB(200)


          4.3% improvement


        2) -Xmx256M, indexWriter.setMaxBufferSizeMB(200)


          86.5% improvement




                                                            32
LUCENE-2329: Parallel posting arrays -
Performance

• With large heap there is a small improvement due to per-term memory
  savings


• With small heap the garbage collector is invoked much more often - huge
  improvement due to smaller number of objects (depending on doc sizes we
  have seen improvements of up to 400%!)


• With searchable RAM buffers we want to utilize all the RAM we have; with
  parallel arrays we can maintain high indexing performance even if we get
  close to the max heap size




                                                                             33
Inverted index components

         Posting list storage



                                    ?
  Dictionary      Parallel arrays



                                        pointer to the most recently
                                        indexed posting for a term




                                                                       34
Posting lists storage - Objectives

• Store many single-linked lists of different lengths space-efficiently


• The number of java objects should be independent of the number of lists or
  number of items in the lists


• Every item should be a possible entry point into the lists for iterators, i.e.
  items should not be dependent on other items (e.g. no delta encoding)


• Append and read possible by multiple threads in a lock-free fashion (single
  append thread, multiple reader threads)


• Traversal in backwards order




                                                                                   35
Memory management

4 int[]
 pools




                  = 32K int[]




                                36
Memory management

4 int[]
 pools



                                 Each pool can
                                   be grown
                  = 32K int[]   individually by
                                  adding 32K
                                     blocks




                                                  37
Memory management

4 int[]
 pools




  • For simplicity we can forget about the blocks for now and think of the pools
    as continuous, unbounded int[] arrays


  • Small total number of Java objects (each 32K block is one object)




                                                                                   38
Memory management
 slice size
     211
     27
     24
     21


• Slices can be allocated in each pool


• Each pool has a different, but fixed slice size




                                                   39
Adding and appending to a list
 slice size
     211
    27                           available
    24                           allocated
    21                           current list




                                                40
Adding and appending to a list
 slice size
     211
    27                                 available
    24                                 allocated
    21                                 current list



                Store first two
              postings in this slice




                                                      41
Adding and appending to a list
 slice size
     211
     27                                                        available
     24                                                        allocated
     21                                                        current list



When first slice is full, allocate another one in second pool




                                                                              42
Adding and appending to a list
 slice size
     211
    27                                                  available
    24                                                  allocated
    21                                                  current list



         Allocate a slice on each level as list grows




                                                                       43
Adding and appending to a list
 slice size
     211
    27                                                           available
    24                                                           allocated
    21                                                           current list




                   On upper most level one list can own multiple slices




                                                                                44
Posting list format

int (32 bits)




                      docID                      textPosition
                     24 bits                        8 bits
                    max. 16.7M                     max. 255




• Tweet text can only have 140 chars


• Decoding speed significantly improved compared to delta and VInt decoding
  (early experiments suggest 5x improvement compared to vanilla Lucene with
  FSDirectory)


                                                                              45
Addressing items

• Use 32 bit (int) pointers to address any item in any list unambiguously:



     int (32 bits)




                 poolIndex         sliceIndex           offset in slice
                  2 bits           19-29 bits              1-11 bits
                    0-3          depends on pool        depends on pool



• Nice symmetry: Postings and address pointers both fit into a 32 bit int




                                                                             46
Linking the slices
 slice size
     211
    27               available
    24               allocated
    21               current list




                                    47
Linking the slices
 slice size
     211
    27                                                                        available
    24                                                                        allocated
    21                                                                        current list


   Dictionary   Parallel arrays




                                  pointer to the last posting indexed for a term




                                                                                             48
Objectives

• Store large amount of linked lists of a priori unknown lengths efficiently in
  memory


• Allow traversal of lists in reverse order (backwards linked)


• Allow efficient garbage collection




                                                                                 49
A Billion Queries Per Day

 Agenda

 - Introduction

 - Posting list format and early query termination

 - Index memory model

 ‣ Lock-free algorithms and data structures

 - Roadmap




                                                     50
Lock-free algorithms and
    data structures



                           51
Objectives

• Support continuously high indexing rate and concurrent search rate


• Both should be independent, i.e. indexing rate should not decrease when
  search rate increases and vice versa.




                                                                            52
Definitions

• Pessimistic locking


     • A thread holds an exclusive lock on a resource, while an action is
       performed [mutual exclusion]


     • Usually used when conflicts are expected to be likely


• Optimistic locking


     • Operations are tried to be performed atomically without holding a lock;
       conflicts can be detected; retry logic is often used in case of conflicts


     • Usually used when conflicts are expected to be the exception



                                                                                 53
Definitions

• Non-blocking algorithm


 Ensures, that threads competing for shared resources do not have their
 execution indefinitely postponed by mutual exclusion.


• Lock-free algorithm


 A non-blocking algorithm is lock-free if there is guaranteed system-wide
 progress.


• Wait-free algorithm


 A non-blocking algorithm is wait-free, if there is guaranteed per-thread
 progress.

                                                                   * Source: Wikipedia
                                                                                         54
Concurrency

• Having a single writer thread simplifies our problem: no locks have to be used
  to protect data structures from corruption (only one thread modifies data)


• But: we have to make sure that all readers always see a consistent state of
  all data structures -> this is much harder than it sounds!


• In Java, it is not guaranteed that one thread will see changes that another
  thread makes in program execution order, unless the same memory barrier is
  crossed by both threads -> safe publication


• Safe publication can be achieved in different, subtle ways. Read the great
  book “Java concurrency in practice” by Brian Goetz for more information!




                                                                                  55
Java Memory Model

• Program order rule


 Each action in a thread happens-before every action in that thread that comes
 later in the program order.


• Volatile variable rule


 A write to a volatile field happens-before every subsequent read of that same
 field.


• Transitivity


 If A happens-before B, and B happens-before C, then A happens-before C.


                                               * Source: Brian Goetz: Java Concurrency in Practice
                                                                                                56
Concurrency

                            Thread A   Thread B


Cache


RAM              0   time



        int x;




                                                  57
Concurrency
                     Thread A writes x=5 to cache
                                                    Thread A   Thread B


Cache                 5                               x = 5;




RAM              0                         time



        int x;




                                                                          58
Concurrency

                                Thread A             Thread B


Cache                5            x = 5;


                                                     while(x != 5);
RAM              0       time



        int x;


                                 This condition will likely
                                   never become false!




                                                                      59
Concurrency

                            Thread A   Thread B


Cache


RAM              0   time



        int x;




                                                  60
Concurrency
                     Thread A writes b=1 to RAM,
                         because b is volatile
                                                   Thread A   Thread B


Cache                  5                             x = 5;

                                                     b = 1;

RAM              0     1                   time



        int x;

                 volatile int b;




                                                                         61
Concurrency

                                           Thread A     Thread B


Cache                  5                       x = 5;

                                              b = 1;

RAM              0     1           time
                                                        int dummy = b;
                                                        while(x != 5);

        int x;
                                   Read volatile b
                 volatile int b;




                                                                         62
Concurrency

                                          Thread A             Thread B


Cache                  5                    x = 5;

                                            b = 1;

RAM              0     1           time
                                                               int dummy = b;
                                                               while(x != 5);

        int x;

                 volatile int b;          happens-before



• Program order rule: Each action in a thread happens-before every action in
  that thread that comes later in the program order.


                                                                                63
Concurrency

                                            Thread A               Thread B


Cache                  5                      x = 5;

                                              b = 1;

RAM              0     1             time
                                                                  int dummy = b;
                                                                  while(x != 5);

        int x;

                 volatile int b;             happens-before



• Volatile variable rule: A write to a volatile field happens-before every
  subsequent read of that same field.


                                                                                   64
Concurrency

                                          Thread A              Thread B


Cache                  5                    x = 5;

                                            b = 1;

RAM              0     1           time
                                                               int dummy = b;
                                                               while(x != 5);

        int x;

                 volatile int b;           happens-before



• Transitivity: If A happens-before B, and B happens-before C, then A
  happens-before C.


                                                                                65
Concurrency

                                            Thread A               Thread B


Cache                  5                       x = 5;

                                              b = 1;

RAM              0     1             time
                                                                  int dummy = b;
                                                                  while(x != 5);

        int x;

                 volatile int b;
                                                This condition will be
                                                   false, i.e. x==5

• Note: x itself doesn’t have to be volatile. There can be many variables like x,
  but we need only a single volatile field.


                                                                                    66
Concurrency

                                             Thread A              Thread B


Cache                  5                       x = 5;

                                               b = 1;

RAM              0     1             time
                                                                  int dummy = b;
                                                                  while(x != 5);

        int x;                              Memory barrier
                 volatile int b;



• Note: x itself doesn’t have to be volatile. There can be many variables like x,
  but we need only a single volatile field.


                                                                                    67
Demo




       68
Concurrency

                                             Thread A              Thread B


Cache                  5                       x = 5;

                                               b = 1;

RAM              0     1             time
                                                                  int dummy = b;
                                                                  while(x != 5);

        int x;                              Memory barrier
                 volatile int b;



• Note: x itself doesn’t have to be volatile. There can be many variables like x,
  but we need only a single volatile field.


                                                                                    69
Concurrency

                   IndexWriter       IndexReader


                   write 100 docs

                   maxDoc = 100

            time
                   write more docs   in IR.open(): read maxDoc
                                     search upto maxDoc

   maxDoc is volatile




                                                                 70
Concurrency

                       IndexWriter          IndexReader


                       write 100 docs

                       maxDoc = 100

                time
                       write more docs      in IR.open(): read maxDoc
                                            search upto maxDoc

       maxDoc is volatile

                                           happens-before



• Only maxDoc is volatile. All other fields that IW writes to and IR reads from
  don’t need to be!


                                                                                 71
So far...

• Single writer/multi reader approach


• High performance can be achieved by avoiding many volatile/atomic fields


• Safe publication can be achieved by ensuring that memory barriers are
  crossed


• Opening an in-memory reader is extremely cheap - can be done for every
  query (at Twitter we open a billion IndexReaders per day!)


• Problem still unsolved: how to update term->postinglist pointers to ensure
  correct search results?




                                                                               72
Updating posting pointers   written after memory barrier was
                                          crossed
 slice size
     211
    27
    24
    21

                                    posting
                                    pointer




                                                               73
Updating posting pointers                           written after memory barrier was
                                                                  crossed
 slice size
     211
    27
    24
    21                                                                updated
                                                                       pointer
                                                           previous
                                                            pointer

• Safe publication guarantees that everything after a memory barrier is crossed
  is visible to a thread


• But: a thread might already see a newer state even if the next memory barrier
  wasn’t crossed yet




                                                                                       74
Updating posting pointers                            written after memory barrier was
                                                                   crossed
 slice size
     211
    27
    24
    21                                                                 updated
                                                                        pointer
                                                            previous
                                                             pointer

• This means: a reader might get a postings pointer addressing an int[] block
  that is not allocated yet!


• We have to handle this case carefully




                                                                                        75
Updating posting pointers

• Basic idea: keep two most recent pointers


• Writer toggles active pointer whenever memory barrier is crossed


• Reader can detect if a pointer is safe to follow


• Optimistic approach: Reader always considers latest pointer first - only if that
  one isn’t safe it falls back to previous pointer




                                                                                    76
Wait-free

• Not a single exclusive lock


• Writer thread can always make progress


• Optimistic locking (retry-logic) in a few places for searcher thread


• Retry logic very simple and guaranteed to always make progress - one of the
  two available posting pointers is always safe to use




                                                                                77
Objectives

• Support continuously high indexing rate and concurrent search rate


• Both should be independent, i.e. indexing rate should not decrease when
  search rate increases and vice versa.




                                                                            78
A Billion Queries Per Day

 Agenda

 - Introduction

 - Posting list format and early query termination

 - Index memory model

 - Lock-free algorithms and data structures

 ‣ Roadmap




                                                     79
Roadmap




          80
Roadmap

• LUCENE-2329: Parallel posting arrays


• LUCENE-2324: Per-thread DocumentsWriter and sequence IDs


• LUCENE-2346: Change in-memory postinglist format


• LUCENE-2312: Search on DocumentsWriters RAM buffer


• IndexReader, that can switch from RAM buffer to flushed segment on-the-fly


• Sorted term dictionary (wildcards, numeric queries)


• Stored fields, TermVectors, Payloads (Attributes)


                                                                             81
Questions?

A Billion Queries Per Day
Michael Busch
@michibusch
michael@twitter.com
buschmi@apache.org




                                   82

Contenu connexe

En vedette

Gaiety Hotel - full version
Gaiety Hotel - full versionGaiety Hotel - full version
Gaiety Hotel - full versiondummypackages
 
Updated: Getting Ready for Due-Diligence
Updated:  Getting Ready for Due-DiligenceUpdated:  Getting Ready for Due-Diligence
Updated: Getting Ready for Due-DiligenceMarty Kaszubowski
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
Searching The United States Code with Solr/Lucene
Searching The United States Code with Solr/LuceneSearching The United States Code with Solr/Lucene
Searching The United States Code with Solr/LuceneLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Descritores de linguagem
Descritores de linguagemDescritores de linguagem
Descritores de linguagemgindri
 
Windows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作るWindows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作る彰 村地
 
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14Marty Kaszubowski
 
Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14Marty Kaszubowski
 
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Lucidworks (Archived)
 
Kelly Clarkson
Kelly ClarksonKelly Clarkson
Kelly Clarksontanica
 
Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条彰 村地
 
Spanish bombss
Spanish bombssSpanish bombss
Spanish bombsstanica
 
Davis mark advanced search analytics in 20 minutes
Davis mark   advanced search analytics in 20 minutesDavis mark   advanced search analytics in 20 minutes
Davis mark advanced search analytics in 20 minutesLucidworks (Archived)
 

En vedette (20)

Com camp2014
Com camp2014Com camp2014
Com camp2014
 
Gaiety Hotel - full version
Gaiety Hotel - full versionGaiety Hotel - full version
Gaiety Hotel - full version
 
How To Get The Justin Bieber Smile
How To Get The Justin Bieber SmileHow To Get The Justin Bieber Smile
How To Get The Justin Bieber Smile
 
корея
кореякорея
корея
 
Updated: Getting Ready for Due-Diligence
Updated:  Getting Ready for Due-DiligenceUpdated:  Getting Ready for Due-Diligence
Updated: Getting Ready for Due-Diligence
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
Searching The United States Code with Solr/Lucene
Searching The United States Code with Solr/LuceneSearching The United States Code with Solr/Lucene
Searching The United States Code with Solr/Lucene
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Descritores de linguagem
Descritores de linguagemDescritores de linguagem
Descritores de linguagem
 
Windows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作るWindows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作る
 
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14
 
Noche Estrellada
Noche EstrelladaNoche Estrellada
Noche Estrellada
 
Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14
 
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
 
Kelly Clarkson
Kelly ClarksonKelly Clarkson
Kelly Clarkson
 
Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条
 
Spanish bombss
Spanish bombssSpanish bombss
Spanish bombss
 
Davis mark advanced search analytics in 20 minutes
Davis mark   advanced search analytics in 20 minutesDavis mark   advanced search analytics in 20 minutes
Davis mark advanced search analytics in 20 minutes
 
Ashe
AsheAshe
Ashe
 

Plus de Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 

Plus de Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 

Lucene rev preso busch realtime search lr1010

  • 1. A Billion Queries Per Day Michael Busch @michibusch michael@twitter.com buschmi@apache.org 1
  • 2. A Billion Queries Per Day Agenda ‣ Introduction - Posting list format and early query termination - Index memory model - Lock-free algorithms and data structures - Roadmap 2
  • 4. Introduction • Search @ Twitter today: • >1,000 TPS (tweets/sec) = >80,000,000 tweets/day • >12,000 QPS (queries/sec) = >1,000,000,000 queries/day • <10s latency (entire pipeline) • <1s latency (indexer) • Performance goal: Ability to scale to support Twitter’s continued growth 4
  • 5. Introduction • 1st gen search engine based on MySQL • Next-gen engine based on (partially rewritten) Lucene Java • Significantly more performant (orders of magnitude) • Much easier to develop new features on the new platform - stay tuned! 5
  • 6. A Billion Queries Per Day Agenda - Introduction ‣ Posting list format and early query termination - Index memory model - Lock-free algorithms and data structures - Roadmap 6
  • 7. Posting list format and early query termination 7
  • 8. Objectives • Only evaluate as few documents as possible before terminating the query • Rank documents in reverse time order (newest documents first) 8
  • 9. Inverted Index 101 1 The old night keeper keeps the keep in the town 2 In the big old house in the big old gown. 3 The house in the town had the big old keep 4 Where the old night keeper never did sleep. 5 The night keeper keeps the keep in the night 6 And keeps in the dark and sleeps in the light. Table with 6 documents Example from: Justin Zobel , Alistair Moffat, Inverted files for text search engines, ACM Computing Surveys (CSUR) v.38 n.2, p.6-es, 2006 9
  • 10. Inverted Index 101 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark 1 <6> 4 Where the old night keeper never did sleep. did 1 <4> 5 The night keeper keeps the keep in the night gown 1 <2> 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3> <5> <6> keep 3 <1> <3> <5> keeper 3 <1> <4> <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists 10
  • 11. Inverted Index 101 Query: keeper 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark 1 <6> 4 Where the old night keeper never did sleep. did 1 <4> 5 The night keeper keeps the keep in the night gown 1 <2> 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3> <5> <6> keep 3 <1> <3> <5> keeper 3 <1> <4> <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists 11
  • 12. Inverted Index 101 Query: keeper 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark 1 <6> 4 Where the old night keeper never did sleep. did 1 <4> 5 The night keeper keeps the keep in the night gown 1 <2> 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3> <5> <6> keep 3 <1> <3> <5> keeper 3 <1> <4> <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists 12
  • 13. Early query termination Query: old 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark 1 <6> 4 Where the old night keeper never did sleep. did 1 <4> 5 The night keeper keeps the keep in the night gown 1 <2> 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3> <5> <6> keep 3 <1> <3> <5> keeper 3 <1> <4> <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists 13
  • 14. Early query termination Query: old 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark 1 <6> 4 Where the old night keeper never did sleep. did 1 <4> 5 The night keeper keeps the keep in the night gown 1 <2> 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3> <5> <6> keep 3 Walk posting list in reverse <1> <3> <5> order keeper 3 <1> <4> <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists 14
  • 15. Early query termination Query: old 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark 1 <6> 4 Where the old night keeper never did sleep. did 1 <4> 5 The night keeper keeps the keep in the night gown 1 <2> 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3>results are requested: E.g. 2 <5> <6> keep 3 <1> <3> <5>termination after exactly Optimal keeper 3 <1> <4>two postings were read <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists 15
  • 16. Early query termination • Possible if time-ordered results are desired (which is often the case in realtime search applications) • Posting list requirements: • Pointer from dictionary to last posting in a list (must always be up to date) • Possibility to iterate lists in reverse order • Allow efficient skipping 16
  • 17. Posting list encodings Doc IDs to encode: 5, 15, 9000, 9002 Delta encoding: 5, 10, 8985, 2 Safes space, if appropriate compression techniques are used, e.g. variable-length integers (VInt) • Lucene uses a combination of delta encoding and VInt compression • VInts are expensive to decode • Problem 1: How to traverse posting lists backwards? • Problem 2: How to write a posting atomically, if not a primitive java type 17
  • 18. Postinglist format int (32 bits) docID textPosition 24 bits 8 bits max. 16.7M max. 255 • Tweet text can only have 140 chars • Decoding speed significantly improved compared to delta and VInt decoding (early experiments suggest 5x improvement compared to vanilla Lucene with FSDirectory) 18
  • 19. Postinglist format • ints can be written atomically in Java • Backwards traversal easy on absolute docIDs (not deltas) • Every posting is a possible entry point for a searcher • Skipping can be done without additional data structures as binary search, even though there are better approaches which should be explored • On tweet indexes we need about 30% more storage for docIDs compared to delta+Vints, but that’s compensated by savings on position encoding! • Max. segment size: 2^24 = 16.7M tweets 19
  • 20. Objectives • Only evaluate as few documents as possible before terminating the query • Rank documents in reverse time order (newest documents first) 20
  • 21. A Billion Queries Per Day Agenda - Introduction - Posting list format and early query termination ‣ Index memory model - Lock-free algorithms and data structures - Roadmap 21
  • 23. Objectives • Store large amount of linked lists of a priori unknown lengths efficiently in memory • Allow traversal of lists in reverse order (backwards linked) • Allow efficient garbage collection 23
  • 24. Inverted Index 1 The old night keeper keeps the keep in the town term freq 2 In the big old house in the big old gown. and 1 <6> big 2 <2> <3> 3 The house in the town had the big old keep dark Per <6> we store different 1 term 4 Where the old night keeper never did sleep. did kinds<4>metadata: text pointer, 1 of 5 The night keeper keeps the keep in the night gown 1 <2> frequency, postings pointer, etc. 6 And keeps in the dark and sleeps in the light. had 1 <3> house 2 <2> <3> Table with 6 documents in 5 <1> <2> <3> <5> <6> keep 3 <1> <3> <5> keeper 3 <1> <4> <5> keeps 3 <1> <5> <6> light 1 <6> never 1 <4> night 3 <1> <4> <5> old 4 <1> <2> <3> <4> sleep 1 <4> sleeps 1 <6> the 6 <1> <2> <3> <4> <5> <6> town 2 <1> <3> where 1 <4> Dictionary and posting lists 24
  • 25. LUCENE-2329: Parallel posting arrays class PostingList int textPointer; int postingsPointer; int frequency; ... • Term hashtable is an array of these objects: PostingList[] termsHash • For each unique term in a segment we need an instance; this results in a very large number of objects that are long-living, i.e. the garbage collecter can’t remove them quickly (they need to stay in memory until the segment is flushed) • With a searchable RAM buffer we want to flush much less often and allow DocumentsWriter to fill up the available memory 25
  • 26. LUCENE-2329: Parallel posting arrays class PostingList int textPointer; int postingsPointer; int frequency; ... • Having a large number of long-living objects is very expensive in Java, especially when the default mark-and-sweep garbage collector is used • The mark phase of GC becomes very expensive, because all long-living objects in memory have to be checked • We need to reduce the number of objects to improve GC performance! -> Parallel posting arrays 26
  • 27. LUCENE-2329: Parallel posting arrays PostingList[] int textPointer; int postingsPointer; int frequency; 27
  • 28. LUCENE-2329: Parallel posting arrays termID int[] textPointer; postingsPointer; frequency; int[] int[] int[] 0 1 2 3 4 5 6 28
  • 29. LUCENE-2329: Parallel posting arrays termID int[] textPointer; postingsPointer; frequency; int[] int[] int[] 0 t0 p0 f0 1 2 3 4 5 6 0 29
  • 30. LUCENE-2329: Parallel posting arrays termID int[] textPointer; postingsPointer; frequency; int[] int[] int[] 0 t0 p0 f0 1 1 t1 p1 f1 2 3 4 5 6 0 30
  • 31. LUCENE-2329: Parallel posting arrays termID int[] textPointer; postingsPointer; frequency; int[] int[] int[] 0 t0 p0 f0 1 1 t1 p1 f1 2 t2 p2 f2 3 4 5 6 0 • Total number of objects is now greatly reduced and is constant and independent of number of unique terms 2 • With parallel arrays we safe 28 bytes per unique term -> 41% savings compared to PostingList object 31
  • 32. LUCENE-2329: Parallel posting arrays - Performance • Performance experiments: Index 1M wikipedia docs 1) -Xmx2048M, indexWriter.setMaxBufferSizeMB(200) 4.3% improvement 2) -Xmx256M, indexWriter.setMaxBufferSizeMB(200) 86.5% improvement 32
  • 33. LUCENE-2329: Parallel posting arrays - Performance • With large heap there is a small improvement due to per-term memory savings • With small heap the garbage collector is invoked much more often - huge improvement due to smaller number of objects (depending on doc sizes we have seen improvements of up to 400%!) • With searchable RAM buffers we want to utilize all the RAM we have; with parallel arrays we can maintain high indexing performance even if we get close to the max heap size 33
  • 34. Inverted index components Posting list storage ? Dictionary Parallel arrays pointer to the most recently indexed posting for a term 34
  • 35. Posting lists storage - Objectives • Store many single-linked lists of different lengths space-efficiently • The number of java objects should be independent of the number of lists or number of items in the lists • Every item should be a possible entry point into the lists for iterators, i.e. items should not be dependent on other items (e.g. no delta encoding) • Append and read possible by multiple threads in a lock-free fashion (single append thread, multiple reader threads) • Traversal in backwards order 35
  • 36. Memory management 4 int[] pools = 32K int[] 36
  • 37. Memory management 4 int[] pools Each pool can be grown = 32K int[] individually by adding 32K blocks 37
  • 38. Memory management 4 int[] pools • For simplicity we can forget about the blocks for now and think of the pools as continuous, unbounded int[] arrays • Small total number of Java objects (each 32K block is one object) 38
  • 39. Memory management slice size 211 27 24 21 • Slices can be allocated in each pool • Each pool has a different, but fixed slice size 39
  • 40. Adding and appending to a list slice size 211 27 available 24 allocated 21 current list 40
  • 41. Adding and appending to a list slice size 211 27 available 24 allocated 21 current list Store first two postings in this slice 41
  • 42. Adding and appending to a list slice size 211 27 available 24 allocated 21 current list When first slice is full, allocate another one in second pool 42
  • 43. Adding and appending to a list slice size 211 27 available 24 allocated 21 current list Allocate a slice on each level as list grows 43
  • 44. Adding and appending to a list slice size 211 27 available 24 allocated 21 current list On upper most level one list can own multiple slices 44
  • 45. Posting list format int (32 bits) docID textPosition 24 bits 8 bits max. 16.7M max. 255 • Tweet text can only have 140 chars • Decoding speed significantly improved compared to delta and VInt decoding (early experiments suggest 5x improvement compared to vanilla Lucene with FSDirectory) 45
  • 46. Addressing items • Use 32 bit (int) pointers to address any item in any list unambiguously: int (32 bits) poolIndex sliceIndex offset in slice 2 bits 19-29 bits 1-11 bits 0-3 depends on pool depends on pool • Nice symmetry: Postings and address pointers both fit into a 32 bit int 46
  • 47. Linking the slices slice size 211 27 available 24 allocated 21 current list 47
  • 48. Linking the slices slice size 211 27 available 24 allocated 21 current list Dictionary Parallel arrays pointer to the last posting indexed for a term 48
  • 49. Objectives • Store large amount of linked lists of a priori unknown lengths efficiently in memory • Allow traversal of lists in reverse order (backwards linked) • Allow efficient garbage collection 49
  • 50. A Billion Queries Per Day Agenda - Introduction - Posting list format and early query termination - Index memory model ‣ Lock-free algorithms and data structures - Roadmap 50
  • 51. Lock-free algorithms and data structures 51
  • 52. Objectives • Support continuously high indexing rate and concurrent search rate • Both should be independent, i.e. indexing rate should not decrease when search rate increases and vice versa. 52
  • 53. Definitions • Pessimistic locking • A thread holds an exclusive lock on a resource, while an action is performed [mutual exclusion] • Usually used when conflicts are expected to be likely • Optimistic locking • Operations are tried to be performed atomically without holding a lock; conflicts can be detected; retry logic is often used in case of conflicts • Usually used when conflicts are expected to be the exception 53
  • 54. Definitions • Non-blocking algorithm Ensures, that threads competing for shared resources do not have their execution indefinitely postponed by mutual exclusion. • Lock-free algorithm A non-blocking algorithm is lock-free if there is guaranteed system-wide progress. • Wait-free algorithm A non-blocking algorithm is wait-free, if there is guaranteed per-thread progress. * Source: Wikipedia 54
  • 55. Concurrency • Having a single writer thread simplifies our problem: no locks have to be used to protect data structures from corruption (only one thread modifies data) • But: we have to make sure that all readers always see a consistent state of all data structures -> this is much harder than it sounds! • In Java, it is not guaranteed that one thread will see changes that another thread makes in program execution order, unless the same memory barrier is crossed by both threads -> safe publication • Safe publication can be achieved in different, subtle ways. Read the great book “Java concurrency in practice” by Brian Goetz for more information! 55
  • 56. Java Memory Model • Program order rule Each action in a thread happens-before every action in that thread that comes later in the program order. • Volatile variable rule A write to a volatile field happens-before every subsequent read of that same field. • Transitivity If A happens-before B, and B happens-before C, then A happens-before C. * Source: Brian Goetz: Java Concurrency in Practice 56
  • 57. Concurrency Thread A Thread B Cache RAM 0 time int x; 57
  • 58. Concurrency Thread A writes x=5 to cache Thread A Thread B Cache 5 x = 5; RAM 0 time int x; 58
  • 59. Concurrency Thread A Thread B Cache 5 x = 5; while(x != 5); RAM 0 time int x; This condition will likely never become false! 59
  • 60. Concurrency Thread A Thread B Cache RAM 0 time int x; 60
  • 61. Concurrency Thread A writes b=1 to RAM, because b is volatile Thread A Thread B Cache 5 x = 5; b = 1; RAM 0 1 time int x; volatile int b; 61
  • 62. Concurrency Thread A Thread B Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; Read volatile b volatile int b; 62
  • 63. Concurrency Thread A Thread B Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; volatile int b; happens-before • Program order rule: Each action in a thread happens-before every action in that thread that comes later in the program order. 63
  • 64. Concurrency Thread A Thread B Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; volatile int b; happens-before • Volatile variable rule: A write to a volatile field happens-before every subsequent read of that same field. 64
  • 65. Concurrency Thread A Thread B Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; volatile int b; happens-before • Transitivity: If A happens-before B, and B happens-before C, then A happens-before C. 65
  • 66. Concurrency Thread A Thread B Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; volatile int b; This condition will be false, i.e. x==5 • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field. 66
  • 67. Concurrency Thread A Thread B Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; Memory barrier volatile int b; • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field. 67
  • 68. Demo 68
  • 69. Concurrency Thread A Thread B Cache 5 x = 5; b = 1; RAM 0 1 time int dummy = b; while(x != 5); int x; Memory barrier volatile int b; • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field. 69
  • 70. Concurrency IndexWriter IndexReader write 100 docs maxDoc = 100 time write more docs in IR.open(): read maxDoc search upto maxDoc maxDoc is volatile 70
  • 71. Concurrency IndexWriter IndexReader write 100 docs maxDoc = 100 time write more docs in IR.open(): read maxDoc search upto maxDoc maxDoc is volatile happens-before • Only maxDoc is volatile. All other fields that IW writes to and IR reads from don’t need to be! 71
  • 72. So far... • Single writer/multi reader approach • High performance can be achieved by avoiding many volatile/atomic fields • Safe publication can be achieved by ensuring that memory barriers are crossed • Opening an in-memory reader is extremely cheap - can be done for every query (at Twitter we open a billion IndexReaders per day!) • Problem still unsolved: how to update term->postinglist pointers to ensure correct search results? 72
  • 73. Updating posting pointers written after memory barrier was crossed slice size 211 27 24 21 posting pointer 73
  • 74. Updating posting pointers written after memory barrier was crossed slice size 211 27 24 21 updated pointer previous pointer • Safe publication guarantees that everything after a memory barrier is crossed is visible to a thread • But: a thread might already see a newer state even if the next memory barrier wasn’t crossed yet 74
  • 75. Updating posting pointers written after memory barrier was crossed slice size 211 27 24 21 updated pointer previous pointer • This means: a reader might get a postings pointer addressing an int[] block that is not allocated yet! • We have to handle this case carefully 75
  • 76. Updating posting pointers • Basic idea: keep two most recent pointers • Writer toggles active pointer whenever memory barrier is crossed • Reader can detect if a pointer is safe to follow • Optimistic approach: Reader always considers latest pointer first - only if that one isn’t safe it falls back to previous pointer 76
  • 77. Wait-free • Not a single exclusive lock • Writer thread can always make progress • Optimistic locking (retry-logic) in a few places for searcher thread • Retry logic very simple and guaranteed to always make progress - one of the two available posting pointers is always safe to use 77
  • 78. Objectives • Support continuously high indexing rate and concurrent search rate • Both should be independent, i.e. indexing rate should not decrease when search rate increases and vice versa. 78
  • 79. A Billion Queries Per Day Agenda - Introduction - Posting list format and early query termination - Index memory model - Lock-free algorithms and data structures ‣ Roadmap 79
  • 80. Roadmap 80
  • 81. Roadmap • LUCENE-2329: Parallel posting arrays • LUCENE-2324: Per-thread DocumentsWriter and sequence IDs • LUCENE-2346: Change in-memory postinglist format • LUCENE-2312: Search on DocumentsWriters RAM buffer • IndexReader, that can switch from RAM buffer to flushed segment on-the-fly • Sorted term dictionary (wildcards, numeric queries) • Stored fields, TermVectors, Payloads (Attributes) 81
  • 82. Questions? A Billion Queries Per Day Michael Busch @michibusch michael@twitter.com buschmi@apache.org 82