SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
x86/x64最適化勉強会#4
A x86-optimized rank/select
dictionary for bit sequences
                             2012/6/16
                     Takeshi Yamamuro




                                         1
What’s Succinct Data Structure?




                                  2
SDS: Succinct Data Structure
        • Recently, Getting Popular in Some Areas
              – Researches & Engineering

        • Not Data Structure, But Data Representation
              – A compressed method for other data structures
              – e.g., alphabets, trees, and graphs

        • Transparent Operations w/o Unpacking Explicitly
              – e.g., succinct LZ77 compression*1




*1
                                                                                                             3
     Kreft, S. and Navarro, G.: LZ77-Like Compression with Fast Random Access, In Proceedings of DCC, 2010
More Details
• SDS = Succinct Data + Succinct Index

• Succinct Data
  – Compact representation for target data
  – Almost to information theoretic lower bounds
               e.g., If N patterns, the lower bound’s logN


• Succinct Index
  – O(1) operations for target data
  – o(N) space costs: ignored asymptotically




                                                             4
More Details

   If you need more information, ...




                  cited from: http://goo.gl/rkQ5z
                                                    5
A rank/select dictionary for SDS




                                   6
A Rank/Select Operations
• SDS Composed of Rank/Select Operations
  – Many calls of rank/select inside

• Rank/Select for Succinct Bit Sequences: B[i]
  – rankx(n, B): the total of 1s in B[0...n]
  – selectx(n, B): n-th position of x in B[]



        i   0    1     2    3   4    5   6     7   8
     B[i]   1    0     1    1   0    0   1     1   0
                     rank1(5, B)=3   select1(4, B)=6


                                                       7
A Rank/Select Operations
• Available Rank/Select Implementation
  – ux-trie: http://code.google.com/p/ux-trie/
  – rx: http://code.google.com/p/mozc/
  – marisa-trie: http://code.google.com/p/marisa-trie/


• Today Contributions
  – x86-optimized rank/select
  – https://github.com/maropu/dbitv




                                                         8
Performance Results
        • Performance Benchmark Setups*1
              – Generate a random sequence of bits: 50% density
              – Random rank/select queries over the bits
              – CPU: Intel Core-i5 U470@1.33GHz

        • Latency Observed
              – 11 trials, and median latency




*1
                                                                   9
     Reference: http://d.hatena.ne.jp/s-yata/20111216/1324032373
Performance Results: Rank

                             1.E+03
averaged rank latency (ns)




                             1.E+02




                             1.E+01                ux
                                                   rx
                                                   marisa
                                                   opt

                             1.E+00




                                      bit length
                                                            10
Performance Results: Select

                               1.E+04
averaged select latency (ns)




                               1.E+03



                               1.E+02


                                                     ux
                               1.E+01                rx
                                                     marisa
                                                     opt

                               1.E+00




                                        bit length

                                                              11
Implementation Details




                         12
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space

 B[] =              A sequence of bits


                          N-bits




                                               13
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
                log 2 N
  B[] =                          A sequence of bits

  L[] =            l1                       l2


• Split into log2N fixed-length blocks
• Total Counts Pre-computed in L[]

                           x          x / log 2 N                      x
          rank1 ( x, B)   B[i ]                    B[i ]           B[i]
                          i 1            i 1                                
                                                                 i  x / log 2 N 1

                                      L1[ x / log 2 N ]

                                                                                      14
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
                log 2 N
  B[] =                          A sequence of bits

  L[] =            l1                       l2


• Split into log2N fixed-length blocks
• Total Counts Pre-computed in L[]

                           x          x / log 2 N                      x
          rank1 ( x, B)   B[i ]                    B[i ]           B[i]
                          i 1            i 1                                
                                                                 i  x / log 2 N 1

                                      L[ x / log 2 N ]
                                                                         O(log2N)
                                                   O(1)                               15
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
              log 2 N
 B[] =                     A sequence of bits

  L[] =          l1                l2


• L[]: o(N) space costs

            N                  N
             2
                 log N  O(       )  o( N )
          log N              log N



                                                16
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
              log 2 N
 B[] =                          A sequence of bits

  L[] =           l1                           l2                     1 log n
                                                                       2
 S[] = s1 s2
• Split into 1/2logN fixed-length blocks again
• Total Counts Pre-computed in S[]
                                                         1           
                 x           x / log N 
                                    2                    x / 2 log N 
                                                                                  x
 rank1 ( x, B)   B[i ]                  B[i ]           B[i]                B[i]
                i 1             i 1                               
                                                      i  x / log 2 N 1        1         
                                                                           i   x / log N  1
                                                                                2         
                                                            1
                             L[ x / log 2 n]          S[ x / log n]
                                                            2
                                                                                                  17
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
              log 2 N
 B[] =                          A sequence of bits

  L[] =           l1                           l2                    1 log n
                                                                      2
 S[] = s1 s2
• Split into 1/2logN fixed-length blocks again
• Total Counts Pre-computed in S[]
                                                         1                        O(logN)
                             x / log N 
                                    2                     x / log N 
                                                         2
                 x                                                                x
 rank1 ( x, B)   B[i ]                  B[i ]           B[i]                B[i]
                i 1             i 1                              
                                                      i  x / log 2 N 1        1         
                                                                           i   x / log N  1
                                                                                2         
                                                             1
                             L[ x / log 2 n]          S [ x / log n]
                                                             2
                                        O(1)                       O(1)                           18
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
             log 2 N
 B[] =                    A sequence of bits

  L[] =        l1                 l2           1 log n
                                                2
 S[] = s1 s2
• S[]: o(N) space costs

          N                           log log N
            2
                log(log N )  O( N 
                        2
                                                )  o( N )
     1 2 log N                          log N



                                                             19
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
              log 2 N
 B[] =                           A sequence of bits

  L[] =           l1                             l2                     1 log n
                                                                         2
 S[] = s1 s2
• O(1) Popcount/Table-Lookup in Last Term

                                                           1                         O(logN) -> O(1)
                 x           x / log 2 N                 x / 2 log N 
                                                                                     x
 rank1 ( x, B)   B[i ]                    B[i ]           B[i]                 B[i]
                i 1             i 1                                 
                                                        i  x / log 2 N 1         1         
                                                                              i   x / log N  1
                                                                                   2         
                                                               1
                             L[ x / log 2 n]            S [ x / log n]
                                                               2
                                          O(1)                         O(1)
                                                                                                     20
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
                 log 2 N
 B[] =                         A sequence of bits

  L[] =              l1                l2           1 log n
                                                     2
 S[] = s1 s2
• As a result, o(N) Space Costs

            N     4 N log log N          log log N
                                O( N            )  o( N )
          log N       log N                log N
          L[] size         S[] size



                                                                21
Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space




                                               22
Implementation: Practice
• Low Computation Costs & High Cache Penalties
   – 3 cache/TLB misses per rank




                         ex. rank1(402=256*1+32*4+18, B)
                256bit

  B[]: 01..000000....101......0 0110....001...............0 0000100 ...
        32bit                                   Popcount these left bits

 L[]:            18                     21                                 …
 S[]: 1 3 4 6 7 9 10 13 2 5 7 9 12 13 18 19 1 3 7 …




                                                                           23
Implementation: Practice
• Low Computation Costs & High Cache Penalties
   – 3 cache/TLB misses per rank




                         ex. rank1(402=256*1+32*4+18, B)
                256bit

  B[]: 01..000000....101......0 0110....001...............0 0000100 ...
        32bit                      Miss!        Popcount these left bits

 L[]:            18      Miss!          21                                 …
 S[]: 1 3 4 6 7 9 10 13 2 5 7 9 12 13 18 19 1 3 7 …
                           Miss!




                                                                           24
Implementation: Practice
• Packing the required data into a single cacheline




                                 56B Chunk
         4B                 1B                     32B


   ・・・        12B padding
                                         0110....001..........0 padding


                                 64B Cache line




                                                                          25
Implementation: Practice
• Packing the required data into a single cacheline




                                                      26
Implementation: Practice
• BTW, where select?
  – Omitted for my time limit 
  – Plz see the code ...


• 2 Way Implementation
  – O(logN) complexity
     • ux-trie, rx, and marisa-trie
     • Binary searches with rank
     • Many cache/TLB misses suffered


  – O(1) complexity
     • My implementation to minimize these penalties
     • 1-rank, 1-SIMD comparison, and O(1) –bsf
     • Only 2 cache/TLB misses
                                                       27
Implementation: Practice
• BTW, where select?
  – Omitted for my time limit 
  – Plz see the code ...


• 2 Way Implementation
  – O(logN) complexity
     • ux-trie, rx, and marisa-trie
     • Binary searches with rank
     • Many cache/TLB misses suffered


  – O(1) complexity
     • My implementation to minimize these penalties
     • 1-rank, 1-SIMD comparison, and O(1) –bsf
     • Only 2 cache/TLB misses
                      Not implemented yet ...

                                                       28

Contenu connexe

Tendances

Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 
Data assimilation with OpenDA
Data assimilation with OpenDAData assimilation with OpenDA
Data assimilation with OpenDAnilsvanvelzen
 
Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Ed Dodds
 
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...Yusuf Bhujwalla
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC MeetupDavid Smiley
 
Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015David Smiley
 
STAQ based Matrix estimation - initial concept (presented at hEART conference...
STAQ based Matrix estimation - initial concept (presented at hEART conference...STAQ based Matrix estimation - initial concept (presented at hEART conference...
STAQ based Matrix estimation - initial concept (presented at hEART conference...Luuk Brederode
 
The status of the GeoServer WPS
The status of the GeoServer WPSThe status of the GeoServer WPS
The status of the GeoServer WPSGeoSolutions
 
Reduced ordered binary decision diagram
Reduced ordered binary decision diagramReduced ordered binary decision diagram
Reduced ordered binary decision diagramTeam-VLSI-ITMU
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexityashishtinku
 
Algorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAlgorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAdelina Ahadova
 
Final presentation optical flow estimation with DL
Final presentation  optical flow estimation with DLFinal presentation  optical flow estimation with DL
Final presentation optical flow estimation with DLLeapMind Inc
 

Tendances (18)

Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Data assimilation with OpenDA
Data assimilation with OpenDAData assimilation with OpenDA
Data assimilation with OpenDA
 
Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011
 
4241
42414241
4241
 
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
 
Binary decision diagrams
Binary decision diagramsBinary decision diagrams
Binary decision diagrams
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup
 
An32272275
An32272275An32272275
An32272275
 
Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015Lucene/Solr spatial in 2015
Lucene/Solr spatial in 2015
 
STAQ based Matrix estimation - initial concept (presented at hEART conference...
STAQ based Matrix estimation - initial concept (presented at hEART conference...STAQ based Matrix estimation - initial concept (presented at hEART conference...
STAQ based Matrix estimation - initial concept (presented at hEART conference...
 
The status of the GeoServer WPS
The status of the GeoServer WPSThe status of the GeoServer WPS
The status of the GeoServer WPS
 
Reduced ordered binary decision diagram
Reduced ordered binary decision diagramReduced ordered binary decision diagram
Reduced ordered binary decision diagram
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexity
 
Algorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAlgorithm Complexity and Main Concepts
Algorithm Complexity and Main Concepts
 
Final presentation optical flow estimation with DL
Final presentation  optical flow estimation with DLFinal presentation  optical flow estimation with DL
Final presentation optical flow estimation with DL
 
MSc Presentation
MSc PresentationMSc Presentation
MSc Presentation
 

En vedette

Haswellサーベイと有限体クラスの紹介
Haswellサーベイと有限体クラスの紹介Haswellサーベイと有限体クラスの紹介
Haswellサーベイと有限体クラスの紹介MITSUNARI Shigeo
 
x86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNTx86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNTtakesako
 
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜Ryoma Sin'ya
 
Popcntによるハミング距離計算
Popcntによるハミング距離計算Popcntによるハミング距離計算
Popcntによるハミング距離計算Norishige Fukushima
 
X86opti01 nothingcosmos
X86opti01 nothingcosmosX86opti01 nothingcosmos
X86opti01 nothingcosmosnothingcosmos
 

En vedette (6)

Haswellサーベイと有限体クラスの紹介
Haswellサーベイと有限体クラスの紹介Haswellサーベイと有限体クラスの紹介
Haswellサーベイと有限体クラスの紹介
 
x86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNTx86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNT
 
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
 
Popcntによるハミング距離計算
Popcntによるハミング距離計算Popcntによるハミング距離計算
Popcntによるハミング距離計算
 
X86opti01 nothingcosmos
X86opti01 nothingcosmosX86opti01 nothingcosmos
X86opti01 nothingcosmos
 
明日使えないすごいビット演算
明日使えないすごいビット演算明日使えないすごいビット演算
明日使えないすごいビット演算
 

Similaire à A x86-optimized rank&select dictionary for bit sequences

Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsYu Liu
 
Threshold and Proactive Pseudo-Random Permutations
Threshold and Proactive Pseudo-Random PermutationsThreshold and Proactive Pseudo-Random Permutations
Threshold and Proactive Pseudo-Random PermutationsAleksandr Yampolskiy
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised HashingSean Moran
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptxpallavidhade2
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorPTIHPA
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Leonid Zhukov
 
Ch01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluitonCh01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluitonshin
 
Generic parallelization strategies for data assimilation
Generic parallelization strategies for data assimilationGeneric parallelization strategies for data assimilation
Generic parallelization strategies for data assimilationnilsvanvelzen
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...Alex Pruden
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran
 
system software 16 marks
system software 16 markssystem software 16 marks
system software 16 marksvvcetit
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.
 
Code generation in Compiler Design
Code generation in Compiler DesignCode generation in Compiler Design
Code generation in Compiler DesignKuppusamy P
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
15 06-0459-02-003c-cm-matlab-release-0-85-support-document
15 06-0459-02-003c-cm-matlab-release-0-85-support-document15 06-0459-02-003c-cm-matlab-release-0-85-support-document
15 06-0459-02-003c-cm-matlab-release-0-85-support-documentmaomao125
 
Selective encoding for abstractive sentence summarization
Selective encoding for abstractive sentence summarizationSelective encoding for abstractive sentence summarization
Selective encoding for abstractive sentence summarizationKodaira Tomonori
 
Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2Khaja Dileef
 

Similaire à A x86-optimized rank&select dictionary for bit sequences (20)

Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
 
Threshold and Proactive Pseudo-Random Permutations
Threshold and Proactive Pseudo-Random PermutationsThreshold and Proactive Pseudo-Random Permutations
Threshold and Proactive Pseudo-Random Permutations
 
Slide11 icc2015
Slide11 icc2015Slide11 icc2015
Slide11 icc2015
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
 
Mmclass5
Mmclass5Mmclass5
Mmclass5
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
 
Basic data structures part I
Basic data structures part IBasic data structures part I
Basic data structures part I
 
Ch01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluitonCh01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluiton
 
Generic parallelization strategies for data assimilation
Generic parallelization strategies for data assimilationGeneric parallelization strategies for data assimilation
Generic parallelization strategies for data assimilation
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
system software 16 marks
system software 16 markssystem software 16 marks
system software 16 marks
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
 
Code generation in Compiler Design
Code generation in Compiler DesignCode generation in Compiler Design
Code generation in Compiler Design
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
 
15 06-0459-02-003c-cm-matlab-release-0-85-support-document
15 06-0459-02-003c-cm-matlab-release-0-85-support-document15 06-0459-02-003c-cm-matlab-release-0-85-support-document
15 06-0459-02-003c-cm-matlab-release-0-85-support-document
 
Selective encoding for abstractive sentence summarization
Selective encoding for abstractive sentence summarizationSelective encoding for abstractive sentence summarization
Selective encoding for abstractive sentence summarization
 
Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2Systemsoftwarenotes 100929171256-phpapp02 2
Systemsoftwarenotes 100929171256-phpapp02 2
 

Plus de Takeshi Yamamuro

LT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature ExpectationLT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature ExpectationTakeshi Yamamuro
 
Quick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + αQuick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + αTakeshi Yamamuro
 
MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理Takeshi Yamamuro
 
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache SparkTaming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache SparkTakeshi Yamamuro
 
LLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecodeLLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecodeTakeshi Yamamuro
 
20180417 hivemall meetup#4
20180417 hivemall meetup#420180417 hivemall meetup#4
20180417 hivemall meetup#4Takeshi Yamamuro
 
An Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List CompressionAn Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List CompressionTakeshi Yamamuro
 
Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題Takeshi Yamamuro
 
VLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging HardwareVLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging HardwareTakeshi Yamamuro
 
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4Takeshi Yamamuro
 
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)Takeshi Yamamuro
 
Introduction to Modern Analytical DB
Introduction to Modern Analytical DBIntroduction to Modern Analytical DB
Introduction to Modern Analytical DBTakeshi Yamamuro
 
SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-Takeshi Yamamuro
 
VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-Takeshi Yamamuro
 
研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法Takeshi Yamamuro
 

Plus de Takeshi Yamamuro (20)

LT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature ExpectationLT: Spark 3.1 Feature Expectation
LT: Spark 3.1 Feature Expectation
 
Apache Spark + Arrow
Apache Spark + ArrowApache Spark + Arrow
Apache Spark + Arrow
 
Quick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + αQuick Overview of Upcoming Spark 3.0 + α
Quick Overview of Upcoming Spark 3.0 + α
 
MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理
 
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache SparkTaming Distributed/Parallel Query Execution Engine of Apache Spark
Taming Distributed/Parallel Query Execution Engine of Apache Spark
 
LLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecodeLLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecode
 
20180417 hivemall meetup#4
20180417 hivemall meetup#420180417 hivemall meetup#4
20180417 hivemall meetup#4
 
An Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List CompressionAn Experimental Study of Bitmap Compression vs. Inverted List Compression
An Experimental Study of Bitmap Compression vs. Inverted List Compression
 
Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題Sparkのクエリ処理系と周辺の話題
Sparkのクエリ処理系と周辺の話題
 
20160908 hivemall meetup
20160908 hivemall meetup20160908 hivemall meetup
20160908 hivemall meetup
 
20150513 legobease
20150513 legobease20150513 legobease
20150513 legobease
 
20150516 icde2015 r19-4
20150516 icde2015 r19-420150516 icde2015 r19-4
20150516 icde2015 r19-4
 
VLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging HardwareVLDB2013 R1 Emerging Hardware
VLDB2013 R1 Emerging Hardware
 
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
 
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
 
Introduction to Modern Analytical DB
Introduction to Modern Analytical DBIntroduction to Modern Analytical DB
Introduction to Modern Analytical DB
 
SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-SIGMOD’12勉強会 -Session 7-
SIGMOD’12勉強会 -Session 7-
 
VAST-Tree, EDBT'12
VAST-Tree, EDBT'12VAST-Tree, EDBT'12
VAST-Tree, EDBT'12
 
VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-VLDB’11勉強会 -Session 9-
VLDB’11勉強会 -Session 9-
 
研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法
 

Dernier

Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...lizamodels9
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 

Dernier (20)

Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 

A x86-optimized rank&select dictionary for bit sequences

  • 1. x86/x64最適化勉強会#4 A x86-optimized rank/select dictionary for bit sequences 2012/6/16 Takeshi Yamamuro 1
  • 2. What’s Succinct Data Structure? 2
  • 3. SDS: Succinct Data Structure • Recently, Getting Popular in Some Areas – Researches & Engineering • Not Data Structure, But Data Representation – A compressed method for other data structures – e.g., alphabets, trees, and graphs • Transparent Operations w/o Unpacking Explicitly – e.g., succinct LZ77 compression*1 *1 3 Kreft, S. and Navarro, G.: LZ77-Like Compression with Fast Random Access, In Proceedings of DCC, 2010
  • 4. More Details • SDS = Succinct Data + Succinct Index • Succinct Data – Compact representation for target data – Almost to information theoretic lower bounds e.g., If N patterns, the lower bound’s logN • Succinct Index – O(1) operations for target data – o(N) space costs: ignored asymptotically 4
  • 5. More Details If you need more information, ... cited from: http://goo.gl/rkQ5z 5
  • 7. A Rank/Select Operations • SDS Composed of Rank/Select Operations – Many calls of rank/select inside • Rank/Select for Succinct Bit Sequences: B[i] – rankx(n, B): the total of 1s in B[0...n] – selectx(n, B): n-th position of x in B[] i 0 1 2 3 4 5 6 7 8 B[i] 1 0 1 1 0 0 1 1 0 rank1(5, B)=3 select1(4, B)=6 7
  • 8. A Rank/Select Operations • Available Rank/Select Implementation – ux-trie: http://code.google.com/p/ux-trie/ – rx: http://code.google.com/p/mozc/ – marisa-trie: http://code.google.com/p/marisa-trie/ • Today Contributions – x86-optimized rank/select – https://github.com/maropu/dbitv 8
  • 9. Performance Results • Performance Benchmark Setups*1 – Generate a random sequence of bits: 50% density – Random rank/select queries over the bits – CPU: Intel Core-i5 U470@1.33GHz • Latency Observed – 11 trials, and median latency *1 9 Reference: http://d.hatena.ne.jp/s-yata/20111216/1324032373
  • 10. Performance Results: Rank 1.E+03 averaged rank latency (ns) 1.E+02 1.E+01 ux rx marisa opt 1.E+00 bit length 10
  • 11. Performance Results: Select 1.E+04 averaged select latency (ns) 1.E+03 1.E+02 ux 1.E+01 rx marisa opt 1.E+00 bit length 11
  • 13. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space B[] = A sequence of bits N-bits 13
  • 14. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 • Split into log2N fixed-length blocks • Total Counts Pre-computed in L[] x x / log 2 N  x rank1 ( x, B)   B[i ]   B[i ]   B[i] i 1 i 1   i  x / log 2 N 1 L1[ x / log 2 N ] 14
  • 15. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 • Split into log2N fixed-length blocks • Total Counts Pre-computed in L[] x x / log 2 N  x rank1 ( x, B)   B[i ]   B[i ]   B[i] i 1 i 1   i  x / log 2 N 1 L[ x / log 2 N ] O(log2N) O(1) 15
  • 16. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 • L[]: o(N) space costs N N 2  log N  O( )  o( N ) log N log N 16
  • 17. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 1 log n 2 S[] = s1 s2 • Split into 1/2logN fixed-length blocks again • Total Counts Pre-computed in S[]  1  x x / log N  2  x / 2 log N    x rank1 ( x, B)   B[i ]   B[i ]   B[i]   B[i] i 1 i 1   i  x / log 2 N 1  1  i   x / log N  1  2  1 L[ x / log 2 n] S[ x / log n] 2 17
  • 18. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 1 log n 2 S[] = s1 s2 • Split into 1/2logN fixed-length blocks again • Total Counts Pre-computed in S[]  1  O(logN) x / log N  2 x / log N   2 x   x rank1 ( x, B)   B[i ]   B[i ]   B[i]   B[i] i 1 i 1   i  x / log 2 N 1  1  i   x / log N  1  2  1 L[ x / log 2 n] S [ x / log n] 2 O(1) O(1) 18
  • 19. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 1 log n 2 S[] = s1 s2 • S[]: o(N) space costs N log log N 2  log(log N )  O( N  2 )  o( N ) 1 2 log N log N 19
  • 20. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 1 log n 2 S[] = s1 s2 • O(1) Popcount/Table-Lookup in Last Term  1  O(logN) -> O(1) x x / log 2 N   x / 2 log N    x rank1 ( x, B)   B[i ]   B[i ]   B[i]   B[i] i 1 i 1   i  x / log 2 N 1  1  i   x / log N  1  2  1 L[ x / log 2 n] S [ x / log n] 2 O(1) O(1) 20
  • 21. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space log 2 N B[] = A sequence of bits L[] = l1 l2 1 log n 2 S[] = s1 s2 • As a result, o(N) Space Costs N 4 N log log N log log N   O( N  )  o( N ) log N log N log N L[] size S[] size 21
  • 22. Implementation: 4 Russian Methods • Rule: O(1) operation costs with o(N) space 22
  • 23. Implementation: Practice • Low Computation Costs & High Cache Penalties – 3 cache/TLB misses per rank ex. rank1(402=256*1+32*4+18, B) 256bit B[]: 01..000000....101......0 0110....001...............0 0000100 ... 32bit Popcount these left bits L[]: 18 21 … S[]: 1 3 4 6 7 9 10 13 2 5 7 9 12 13 18 19 1 3 7 … 23
  • 24. Implementation: Practice • Low Computation Costs & High Cache Penalties – 3 cache/TLB misses per rank ex. rank1(402=256*1+32*4+18, B) 256bit B[]: 01..000000....101......0 0110....001...............0 0000100 ... 32bit Miss! Popcount these left bits L[]: 18 Miss! 21 … S[]: 1 3 4 6 7 9 10 13 2 5 7 9 12 13 18 19 1 3 7 … Miss! 24
  • 25. Implementation: Practice • Packing the required data into a single cacheline 56B Chunk 4B 1B 32B ・・・ 12B padding 0110....001..........0 padding 64B Cache line 25
  • 26. Implementation: Practice • Packing the required data into a single cacheline 26
  • 27. Implementation: Practice • BTW, where select? – Omitted for my time limit  – Plz see the code ... • 2 Way Implementation – O(logN) complexity • ux-trie, rx, and marisa-trie • Binary searches with rank • Many cache/TLB misses suffered – O(1) complexity • My implementation to minimize these penalties • 1-rank, 1-SIMD comparison, and O(1) –bsf • Only 2 cache/TLB misses 27
  • 28. Implementation: Practice • BTW, where select? – Omitted for my time limit  – Plz see the code ... • 2 Way Implementation – O(logN) complexity • ux-trie, rx, and marisa-trie • Binary searches with rank • Many cache/TLB misses suffered – O(1) complexity • My implementation to minimize these penalties • 1-rank, 1-SIMD comparison, and O(1) –bsf • Only 2 cache/TLB misses Not implemented yet ... 28