SlideShare une entreprise Scribd logo
1  sur  37
Remove Branches in
   BitVector Select Operations
         - marisa 0.2.2 -
                Susumu Yata
                  @s5yata
                 Brazil, Inc.

                                  1
30 March 2013      Brazil, Inc.
Who I Am
Job
   Brazil, Inc. (groonga developer)
   We need R&D software engineers.


Personal research & development
   Tries
       darts-clone, marisa-trie, etc.
   Corpus
       Nihongo Web Corpus 2010 (NWC 2010)
                                             2
30 March 2013           Brazil, Inc.
Relationships between BitVector and Marisa.

  BitVector and Marisa

                                                3
30 March 2013              Brazil, Inc.
BitVector
What‟s BitVector?
   A sequence of bits


Operations
   BitVector::get(i)
   BitVector::rank(i)
   BitVector::select(i)


                                   4
30 March 2013       Brazil, Inc.
BitVector – Get Operations
Interface
   BitVector::get(i)


Description
   The i-th bit (“0” or “1”)

     0     1    2   …   i–1      i     i+1   …   n-2   n-1
     0     0    1   …    0       1      1    …   0     0


                              Get!
                                                             5
30 March 2013           Brazil, Inc.
BitVector – Rank Operations
Interface
   BitVector::rank(i)


Description
   The number of “1”s up to the i-th bit

     0     1    2   …     i–1    i     i+1   …   n-2   n-1
     0     0    1   …     0      1      1    …   0     0


         How many “1”s?
                                                             6
30 March 2013           Brazil, Inc.
BitVector – Select Operations
Interface
   BitVector::select(i)


Description
   The position of the i-th “1”

     0     1    2   …    …      …      …   …   n-2   n-1
     0     0    1   …    …      …      …   …   0     0


                Where is the i-th “1”?
                                                           7
30 March 2013           Brazil, Inc.
Marisa
 Who‟s Marisa?
   An ordinary human magician

 What‟s Marisa?
   A static and space-efficient dictionary

 Data structure
   Recursive LOUDS-based Patricia tries

 Site
   http://code.google.com/p/marisa-trie
                                              8
30 March 2013        Brazil, Inc.
Marisa – Patricia
Patricia is a labeled tree.
      Keys = Tree + Labels

                                             Node    Label
 ID        Key                                1       “Ar”
                                         4
  0    “Argentina”             1              2     “Brazil”
  1     “Armenia”                        5    3        „C‟
                      0        2
  2      “Brazil”                             4     “gentina”
                                         6
  3     “Canada”               3              5     “menia”
  4     “Cyprus”                         7    6     “anada”
                                              7     “yprus”

                                                                9
30 March 2013             Brazil, Inc.
Marisa – Recursiveness
Unfortunately, this margin is too small…
   Keys = Tree + Labels
   Labels = Tree + Labels
   Labels = Tree + Labels <– Reasonable
   Labels = Tree + Labels
   Labels = Tree + Labels
   Labels = Tree + Labels
   Labels = Tree + Labels
   …
                                           10
30 March 2013       Brazil, Inc.
Marisa – BitVector Usage
LOUDS
   Level-Order Unary Degree Sequence


Terminal flags
   A node is terminal (“1”) or not (“0”).


Link flags
   A node has a link to its multi-byte label
    (“1”) or has a built-in single-byte label (“0”).
                                                  11
30 March 2013         Brazil, Inc.
Marisa – BitVector Usage
LOUDS
   BitVector::get(), select()


Terminal flags
   BitVector::get(), rank(), select()


Link flags
   BitVector::get(), rank()

                                         12
30 March 2013        Brazil, Inc.
How to implement Rank/Select operations.

  Implementations

                                             13
30 March 2013              Brazil, Inc.
Rank Dictionary
Index structures
   r_idx[x].abs = rank(512・x)
       x = 0, 1, 2, …
   r_idx[x].rel[y] =
     rank(512・x + 64・y) – rank(512・x)
       Y = 1, 2, 3, … , 7


Calculation
   abs + rel + popcnt()
                                            14
30 March 2013                Brazil, Inc.
Rank Operations
Time complexity = O(1)
       512              512               512             512             512

         r_idx.abs

             64    64         64     64         64   64         64   64

                   r_idx.rel

                                                64

                                   popcnt()
                                                                                15
30 March 2013                        Brazil, Inc.
Select Dictionary
Index structure
   s_idx[x] = select(512・x)
       i = 0, 1, 2, …


Calculation
   Limit the range by using s_idx.
   Limit the range by using r_idx[x].abs.
   Limit the range by using r_idx[x].rel[y].
   Find the i-th “1” in the range.
                                                16
30 March 2013            Brazil, Inc.
Select Operations
        s_idx                                                s_idx
  512       512        512         512         512         512       512

                  r_idx.abs                   r_idx.abs


           64     64   64     64         64    64     64     64

                  r_idx.rel                          r_idx.rel

                                         64

                               Final round
                                                                           17
30 March 2013                 Brazil, Inc.
Select Final Round
Binary search & table lookup
   Three-level branches
                                     if

                  if                                    if

         if                if                  if                if


     8        8        8        8         8         8        8        8



                Table lookup
                                                                          18
30 March 2013                   Brazil, Inc.
How to remove the branches in the final round.

  Improvements

                                                   19
30 March 2013               Brazil, Inc.
Original
// x is the final 64-bit block (uint64_t).
x = x – ((x >> 1) & MASK_55);
x = (x & MASK_33) + ((x >> 2) & MASK_33);
x = (x + (x >> 4)) & MASK_0F;
x *= MASK_01;          // Tricky popcount
if (i < ((x >> 24) & 0xFF)) { // The first-level branch
  if (i < ((x >> 8) & 0xFF)) { // The second-level branch
    if (i < (x & 0xFF)) {       // The third-level branch
      // The first byte contains the i-th “1”.
    } else {
      // The second byte contains the i-th “1”.
                                                       20
30 March 2013           Brazil, Inc.
Tips – Tricky PopCount
       0        1       1       1       0       0       1       0


x = x – ((x >> 1) & MASK_55);
           1                2               0               1


x = (x & MASK_33) + ((x >> 2) & MASK_33);
                    3                               1


x = (x + (x >> 4)) & MASK_0F;
                                    4

                                                                    21
30 March 2013                       Brazil, Inc.
Tips – Tricky PopCount
// MASK_01 = 0x0101010101010101ULL;
// x = x | (x << 8) | (x << 16) | (x << 24) | …;
x *= MASK_01;

       4        1    3      5       2     6        3   4




      28             23            15              7


                24         20             13           4



                                                           22
30 March 2013              Brazil, Inc.
+ SSE2 (After PopCount)
// y[0 … 7] = i + 1;
__m128i y = _mm_cvtsi64_si128((i + 1) * MASK_01);
__m128i z = _mm_cvtsi64_si128(x);

// Compare the 16 8-bit signed integers in y and z.
// y[k] = (y[k] > z[k]) ? 0xFF : 0x00;
y = _mm_cmpgt_epi8(y, z);          // PCMPGTB

// The j-th byte contains the i-th “1”.
// TABLE is a 128-byte pre-computed table.
uint8_t j = TABLE[_mm_movemask_epi8(y)];
                                                      23
30 March 2013           Brazil, Inc.
Tips – PCMPGTB
y = _mm_cvtsi64_si128((i + 1) * MASK_01);
      20        20   20     20      20     20     20     20


z = _mm_cvtsi64_si128(x);
      28        24   23     20       15     13     7      4


// y[k] = (y[k] > z[k]) ? 0xFF : 0x00;
y = _mm_cmpgt_epi8(y, z);

     0x00   0x00     0x00   0x00   0xFF    0xFF   0xFF   0xFF


                                                                24
30 March 2013               Brazil, Inc.
+ Tricks (After Comparison)
uint64_t j = _mm_cvtsi128_si64(y);

// Calculation without TABLE
j = ((j & MASK_01) * MASK_01) >> 56;

// Calculation with BSR
j = (63 – __builtin_clzll(j + 1)) / 8;

// Calculation with popcnt (SSE4.2 or SSE4a)
j = __builtin_popcountll(j) / 8;

                                               25
30 March 2013              Brazil, Inc.
– SSE2 (Simple and Fast)
// x is the final 64-bit block (uint64_t).
x = x – ((x >> 1) & MASK_55);
x = (x & MASK_33) + ((x >> 2) & MASK_33);
x = (x + (x >> 4)) & MASK_0F;
x *= MASK_01;        // Tricky popcount

uint64_t y = (i + 1) * MASK_01;
uint64_t z = x | MASK_80;
// Compare the 8 7-bit unsigned integers in y and z.
z = (z – y) & MASK_80;
uint8_t j = __builtin_ctzll(z) / 8;
                                                       26
30 March 2013           Brazil, Inc.
Tips – Comparison
uint64_t y = (i + 1) * MASK_01;
     0x14   0x14   0x14   0x14   0x14    0x14   0x14   0x14


uint64_t z = x | MASK_80;
     0x9C   0x98   0x97   0x94   0x8F    0x8D   0x87   0x84


// Compare the 8 7-bit unsigned integers in y and z.
z = (z – y) & MASK_80;

     0x80   0x80   0x80   0x80   0x00    0x00   0x00   0x00


                                                              27
30 March 2013             Brazil, Inc.
+ SSSE3 (For PopCount)
// Get lower nibbles and upper nibbles of x.
__m128i lower = _mm_cvtsi64_si128(x & MASK_0F);
__m128i upper = _mm_cvtsi64_si128(x & MASK_F0);
upper = _mm_srli_epi32(upper, 4);
// Use PSHUFB for counting “1”s in each nibble.
__m128i table =
   _mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0);
lower = _mm_shuffle_epi8(table, lower);
upper = _mm_shuffle_epi8(table, upper);
// Merge the counts to get the number of “1”s in each byte.
x = _mm_cvtsi128_si64(_mm_add_epi8(lower, upper));
x *= MASK_01;
                                                               28
30 March 2013               Brazil, Inc.
Tips – PSHUFB
lower = _mm_cvtsi64_si128(x & MASK_0F);
         12           8           7           4           15           13           7           4


table = _mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, …);
     4        3   3       2   3       2   2       1   3        2   2        1   2       1   1       0


// Perform a parallel 16-way lookup.
lower = _mm_shuffle_epi8(table, lower);

         2            1           3           1           4            3            3           1


                                                                                                        29
30 March 2013                             Brazil, Inc.
How effective the improvements are.

  Evaluation

                                          30
30 March 2013              Brazil, Inc.
Environment
OS
   Mac OSX 10.8.3 (64-bit)
CPU
   Core i7 3720QM – Ivy Bridge
   2.6GHz – up to 3.6GHz
Compiler
   Apple LLVM version 4.2 (clang-425.0.24)
    (based on LLVM 3.2svn)

                                              31
30 March 2013       Brazil, Inc.
Data
Source
   Japanese Wikipedia page titles
   gzip –cd jawiki-20130328-all-titles-in-
    ns0.gz | LC_ALL=C sort –R > data


Details
   Number of keys: 1,367,750
   Average length: 21.14 bytes
   Total length: 28,919,893 bytes
                                              32
30 March 2013       Brazil, Inc.
Binaries
marisa 0.2.1
   ./configure CXX=clang++ --enable-popcnt
   make
   tools/marisa-benchmark < data


marisa 0.2.2
   ./configure CXX=clang++ --enable-sse4
   make
   tools/marisa-benchmark < data
                                              33
30 March 2013      Brazil, Inc.
Results – marisa 0.2.1
Without improvements
   #Tries        Size     Build      Lookup Reverse      Prefix   Predict
                 [KB]    [Kqps]       [Kqps] [Kqps]      [Kqps]    [Kqps]
         1      11,811    724         1,105      1,223   1,038       711
        2       8,639     632          790        877     753       453
        3       8,001      621         750        816     708       406
        4       7,788      591         723        791     687        391
        5       7,701     590           712       781     680       384

   Baseline

                                                                            34
30 March 2013                     Brazil, Inc.
Results – marisa 0.2.2
With improvements
   #Tries        Size     Build      Lookup Reverse      Prefix   Predict
                 [KB]    [Kqps]       [Kqps] [Kqps]      [Kqps]    [Kqps]
         1      11,811    757         1,198      1,359   1,115      772
        2       8,639     657          873       1,000    820       503
        3       8,001      621          817       924     770       453
        4       7,788      613         797        900     752       438
        5       7,701      610         787        884     737       427

   Same size
   Faster operations
                                                                            35
30 March 2013                     Brazil, Inc.
Results – Improvements
Improvement ratios
   #Tries       Size   Build      Lookup Reverse       Prefix   Predict
                 [%]    [%]          [%]     [%]          [%]      [%]
         1      0.00   +4.56      +8.42       +11.12   +7.42     +8.58
        2       0.00   +3.96     +10.52       +14.03   +8.90    +11.04
        3       0.00   0.00       +8.93       +13.24   +8.76    +11.58
        4       0.00   +3.72     +10.24       +13.78   +9.46    +12.02
        5       0.00   +3.39     +10.53       +13.19   +8.38    +11.20

   Same size
   Faster operations
                                                                          36
30 March 2013                  Brazil, Inc.
Conclusion
   “Any sufficiently advanced technology
      is indistinguishable from magic.”


    “Any sufficiently advanced technique
      is indistinguishable from magic.”


                “You are magician.”
                                           37
30 March 2013         Brazil, Inc.

Contenu connexe

Tendances

WASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみたWASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみたMITSUNARI Shigeo
 
査読付き論文を高速生産する1つのマインドセット、3つの方法 (in Japanese)
査読付き論文を高速生産する1つのマインドセット、3つの方法 (in Japanese)査読付き論文を高速生産する1つのマインドセット、3つの方法 (in Japanese)
査読付き論文を高速生産する1つのマインドセット、3つの方法 (in Japanese)Toshihiko Yamakami
 
言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみた言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみたTakuya Koumura
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deploymentYoshinori Matsunobu
 
合成変量とアンサンブル:回帰森と加法モデルの要点
合成変量とアンサンブル:回帰森と加法モデルの要点合成変量とアンサンブル:回帰森と加法モデルの要点
合成変量とアンサンブル:回帰森と加法モデルの要点Ichigaku Takigawa
 
OpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングOpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングToshihiro Kamishima
 
Boostのあるプログラミング生活
Boostのあるプログラミング生活Boostのあるプログラミング生活
Boostのあるプログラミング生活Akira Takahashi
 
ベータ分布の謎に迫る
ベータ分布の謎に迫るベータ分布の謎に迫る
ベータ分布の謎に迫るKen'ichi Matsui
 
パターン認識と機械学習 §6.2 カーネル関数の構成
パターン認識と機械学習 §6.2 カーネル関数の構成パターン認識と機械学習 §6.2 カーネル関数の構成
パターン認識と機械学習 §6.2 カーネル関数の構成Prunus 1350
 
DynamoDBを導入した話
DynamoDBを導入した話DynamoDBを導入した話
DynamoDBを導入した話dcubeio
 
科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性
科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性
科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性Ichigaku Takigawa
 
L 05 bandit with causality-公開版
L 05 bandit with causality-公開版L 05 bandit with causality-公開版
L 05 bandit with causality-公開版Shota Yasui
 
Rユーザのためのspark入門
Rユーザのためのspark入門Rユーザのためのspark入門
Rユーザのためのspark入門Shintaro Fukushima
 
うつ病罹患者に対する信念の構造とスティグマの低減方策 (日本心理学会2017小講演)
うつ病罹患者に対する信念の構造とスティグマの低減方策 (日本心理学会2017小講演)うつ病罹患者に対する信念の構造とスティグマの低減方策 (日本心理学会2017小講演)
うつ病罹患者に対する信念の構造とスティグマの低減方策 (日本心理学会2017小講演)Jun Kashihara
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半Ken'ichi Matsui
 
[Cloud OnAir] 事例紹介: 株式会社オープンハウス 〜Google サービスを活用したオープンハウスの AI の取り組み〜 2020年11月1...
[Cloud OnAir] 事例紹介: 株式会社オープンハウス 〜Google サービスを活用したオープンハウスの AI の取り組み〜 2020年11月1...[Cloud OnAir] 事例紹介: 株式会社オープンハウス 〜Google サービスを活用したオープンハウスの AI の取り組み〜 2020年11月1...
[Cloud OnAir] 事例紹介: 株式会社オープンハウス 〜Google サービスを活用したオープンハウスの AI の取り組み〜 2020年11月1...Google Cloud Platform - Japan
 
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)Motoya Wakiyama
 
画像処理でのPythonの利用
画像処理でのPythonの利用画像処理でのPythonの利用
画像処理でのPythonの利用Yasutomo Kawanishi
 

Tendances (20)

WASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみたWASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみた
 
査読付き論文を高速生産する1つのマインドセット、3つの方法 (in Japanese)
査読付き論文を高速生産する1つのマインドセット、3つの方法 (in Japanese)査読付き論文を高速生産する1つのマインドセット、3つの方法 (in Japanese)
査読付き論文を高速生産する1つのマインドセット、3つの方法 (in Japanese)
 
言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみた言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみた
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
合成変量とアンサンブル:回帰森と加法モデルの要点
合成変量とアンサンブル:回帰森と加法モデルの要点合成変量とアンサンブル:回帰森と加法モデルの要点
合成変量とアンサンブル:回帰森と加法モデルの要点
 
OpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングOpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシング
 
Boostのあるプログラミング生活
Boostのあるプログラミング生活Boostのあるプログラミング生活
Boostのあるプログラミング生活
 
PRML第3章@京大PRML輪講
PRML第3章@京大PRML輪講PRML第3章@京大PRML輪講
PRML第3章@京大PRML輪講
 
ベータ分布の謎に迫る
ベータ分布の謎に迫るベータ分布の謎に迫る
ベータ分布の謎に迫る
 
パターン認識と機械学習 §6.2 カーネル関数の構成
パターン認識と機械学習 §6.2 カーネル関数の構成パターン認識と機械学習 §6.2 カーネル関数の構成
パターン認識と機械学習 §6.2 カーネル関数の構成
 
DynamoDBを導入した話
DynamoDBを導入した話DynamoDBを導入した話
DynamoDBを導入した話
 
科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性
科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性
科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性
 
L 05 bandit with causality-公開版
L 05 bandit with causality-公開版L 05 bandit with causality-公開版
L 05 bandit with causality-公開版
 
Rユーザのためのspark入門
Rユーザのためのspark入門Rユーザのためのspark入門
Rユーザのためのspark入門
 
うつ病罹患者に対する信念の構造とスティグマの低減方策 (日本心理学会2017小講演)
うつ病罹患者に対する信念の構造とスティグマの低減方策 (日本心理学会2017小講演)うつ病罹患者に対する信念の構造とスティグマの低減方策 (日本心理学会2017小講演)
うつ病罹患者に対する信念の構造とスティグマの低減方策 (日本心理学会2017小講演)
 
形態素解析
形態素解析形態素解析
形態素解析
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半
 
[Cloud OnAir] 事例紹介: 株式会社オープンハウス 〜Google サービスを活用したオープンハウスの AI の取り組み〜 2020年11月1...
[Cloud OnAir] 事例紹介: 株式会社オープンハウス 〜Google サービスを活用したオープンハウスの AI の取り組み〜 2020年11月1...[Cloud OnAir] 事例紹介: 株式会社オープンハウス 〜Google サービスを活用したオープンハウスの AI の取り組み〜 2020年11月1...
[Cloud OnAir] 事例紹介: 株式会社オープンハウス 〜Google サービスを活用したオープンハウスの AI の取り組み〜 2020年11月1...
 
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
 
画像処理でのPythonの利用
画像処理でのPythonの利用画像処理でのPythonの利用
画像処理でのPythonの利用
 

Dernier

tools in IDTelated to first year vtu students is useful where they can refer ...
tools in IDTelated to first year vtu students is useful where they can refer ...tools in IDTelated to first year vtu students is useful where they can refer ...
tools in IDTelated to first year vtu students is useful where they can refer ...vinbld123
 
Crack JAG. Guidance program for entry to JAG Dept. & SSB interview
Crack JAG. Guidance program for entry to JAG Dept. & SSB interviewCrack JAG. Guidance program for entry to JAG Dept. & SSB interview
Crack JAG. Guidance program for entry to JAG Dept. & SSB interviewNilendra Kumar
 
LinkedIn Strategic Guidelines April 2024
LinkedIn Strategic Guidelines April 2024LinkedIn Strategic Guidelines April 2024
LinkedIn Strategic Guidelines April 2024Bruce Bennett
 
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一F La
 
办理哈珀亚当斯大学学院毕业证书文凭学位证书
办理哈珀亚当斯大学学院毕业证书文凭学位证书办理哈珀亚当斯大学学院毕业证书文凭学位证书
办理哈珀亚当斯大学学院毕业证书文凭学位证书saphesg8
 
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一fjjwgk
 
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证diploma001
 
Digital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, IndiaDigital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, IndiaDigital Discovery Institute
 
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCRdollysharma2066
 
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量sehgh15heh
 
do's and don'ts in Telephone Interview of Job
do's and don'ts in Telephone Interview of Jobdo's and don'ts in Telephone Interview of Job
do's and don'ts in Telephone Interview of JobRemote DBA Services
 
Graduate Trainee Officer Job in Bank Al Habib 2024.docx
Graduate Trainee Officer Job in Bank Al Habib 2024.docxGraduate Trainee Officer Job in Bank Al Habib 2024.docx
Graduate Trainee Officer Job in Bank Al Habib 2024.docxJobs Finder Hub
 
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样umasea
 
办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书saphesg8
 
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证nhjeo1gg
 
Escort Service Andheri WhatsApp:+91-9833363713
Escort Service Andheri WhatsApp:+91-9833363713Escort Service Andheri WhatsApp:+91-9833363713
Escort Service Andheri WhatsApp:+91-9833363713Riya Pathan
 
定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一Fs sss
 
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607dollysharma2066
 

Dernier (20)

tools in IDTelated to first year vtu students is useful where they can refer ...
tools in IDTelated to first year vtu students is useful where they can refer ...tools in IDTelated to first year vtu students is useful where they can refer ...
tools in IDTelated to first year vtu students is useful where they can refer ...
 
Crack JAG. Guidance program for entry to JAG Dept. & SSB interview
Crack JAG. Guidance program for entry to JAG Dept. & SSB interviewCrack JAG. Guidance program for entry to JAG Dept. & SSB interview
Crack JAG. Guidance program for entry to JAG Dept. & SSB interview
 
LinkedIn Strategic Guidelines April 2024
LinkedIn Strategic Guidelines April 2024LinkedIn Strategic Guidelines April 2024
LinkedIn Strategic Guidelines April 2024
 
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一
办理(Hull毕业证书)英国赫尔大学毕业证成绩单原版一比一
 
办理哈珀亚当斯大学学院毕业证书文凭学位证书
办理哈珀亚当斯大学学院毕业证书文凭学位证书办理哈珀亚当斯大学学院毕业证书文凭学位证书
办理哈珀亚当斯大学学院毕业证书文凭学位证书
 
Young Call~Girl in Pragati Maidan New Delhi 8448380779 Full Enjoy Escort Service
Young Call~Girl in Pragati Maidan New Delhi 8448380779 Full Enjoy Escort ServiceYoung Call~Girl in Pragati Maidan New Delhi 8448380779 Full Enjoy Escort Service
Young Call~Girl in Pragati Maidan New Delhi 8448380779 Full Enjoy Escort Service
 
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一
定制(ECU毕业证书)埃迪斯科文大学毕业证毕业证成绩单原版一比一
 
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
 
Digital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, IndiaDigital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, India
 
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Pitampura Delhi NCR
 
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量
原版定制copy澳洲查尔斯达尔文大学毕业证CDU毕业证成绩单留信学历认证保障质量
 
do's and don'ts in Telephone Interview of Job
do's and don'ts in Telephone Interview of Jobdo's and don'ts in Telephone Interview of Job
do's and don'ts in Telephone Interview of Job
 
Graduate Trainee Officer Job in Bank Al Habib 2024.docx
Graduate Trainee Officer Job in Bank Al Habib 2024.docxGraduate Trainee Officer Job in Bank Al Habib 2024.docx
Graduate Trainee Officer Job in Bank Al Habib 2024.docx
 
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样
办理学位证(纽伦堡大学文凭证书)纽伦堡大学毕业证成绩单原版一模一样
 
办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书
 
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证
原版快速办理MQU毕业证麦考瑞大学毕业证成绩单留信学历认证
 
Escort Service Andheri WhatsApp:+91-9833363713
Escort Service Andheri WhatsApp:+91-9833363713Escort Service Andheri WhatsApp:+91-9833363713
Escort Service Andheri WhatsApp:+91-9833363713
 
定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
 
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
 
FULL ENJOY Call Girls In Gautam Nagar (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In Gautam Nagar (Delhi) Call Us 9953056974FULL ENJOY Call Girls In Gautam Nagar (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In Gautam Nagar (Delhi) Call Us 9953056974
 

X86opti 05 s5yata

  • 1. Remove Branches in BitVector Select Operations - marisa 0.2.2 - Susumu Yata @s5yata Brazil, Inc. 1 30 March 2013 Brazil, Inc.
  • 2. Who I Am Job Brazil, Inc. (groonga developer) We need R&D software engineers. Personal research & development Tries darts-clone, marisa-trie, etc. Corpus Nihongo Web Corpus 2010 (NWC 2010) 2 30 March 2013 Brazil, Inc.
  • 3. Relationships between BitVector and Marisa. BitVector and Marisa 3 30 March 2013 Brazil, Inc.
  • 4. BitVector What‟s BitVector? A sequence of bits Operations BitVector::get(i) BitVector::rank(i) BitVector::select(i) 4 30 March 2013 Brazil, Inc.
  • 5. BitVector – Get Operations Interface BitVector::get(i) Description The i-th bit (“0” or “1”) 0 1 2 … i–1 i i+1 … n-2 n-1 0 0 1 … 0 1 1 … 0 0 Get! 5 30 March 2013 Brazil, Inc.
  • 6. BitVector – Rank Operations Interface BitVector::rank(i) Description The number of “1”s up to the i-th bit 0 1 2 … i–1 i i+1 … n-2 n-1 0 0 1 … 0 1 1 … 0 0 How many “1”s? 6 30 March 2013 Brazil, Inc.
  • 7. BitVector – Select Operations Interface BitVector::select(i) Description The position of the i-th “1” 0 1 2 … … … … … n-2 n-1 0 0 1 … … … … … 0 0 Where is the i-th “1”? 7 30 March 2013 Brazil, Inc.
  • 8. Marisa  Who‟s Marisa? An ordinary human magician  What‟s Marisa? A static and space-efficient dictionary  Data structure Recursive LOUDS-based Patricia tries  Site http://code.google.com/p/marisa-trie 8 30 March 2013 Brazil, Inc.
  • 9. Marisa – Patricia Patricia is a labeled tree. Keys = Tree + Labels Node Label ID Key 1 “Ar” 4 0 “Argentina” 1 2 “Brazil” 1 “Armenia” 5 3 „C‟ 0 2 2 “Brazil” 4 “gentina” 6 3 “Canada” 3 5 “menia” 4 “Cyprus” 7 6 “anada” 7 “yprus” 9 30 March 2013 Brazil, Inc.
  • 10. Marisa – Recursiveness Unfortunately, this margin is too small… Keys = Tree + Labels Labels = Tree + Labels Labels = Tree + Labels <– Reasonable Labels = Tree + Labels Labels = Tree + Labels Labels = Tree + Labels Labels = Tree + Labels … 10 30 March 2013 Brazil, Inc.
  • 11. Marisa – BitVector Usage LOUDS Level-Order Unary Degree Sequence Terminal flags A node is terminal (“1”) or not (“0”). Link flags A node has a link to its multi-byte label (“1”) or has a built-in single-byte label (“0”). 11 30 March 2013 Brazil, Inc.
  • 12. Marisa – BitVector Usage LOUDS BitVector::get(), select() Terminal flags BitVector::get(), rank(), select() Link flags BitVector::get(), rank() 12 30 March 2013 Brazil, Inc.
  • 13. How to implement Rank/Select operations. Implementations 13 30 March 2013 Brazil, Inc.
  • 14. Rank Dictionary Index structures r_idx[x].abs = rank(512・x) x = 0, 1, 2, … r_idx[x].rel[y] = rank(512・x + 64・y) – rank(512・x) Y = 1, 2, 3, … , 7 Calculation abs + rel + popcnt() 14 30 March 2013 Brazil, Inc.
  • 15. Rank Operations Time complexity = O(1) 512 512 512 512 512 r_idx.abs 64 64 64 64 64 64 64 64 r_idx.rel 64 popcnt() 15 30 March 2013 Brazil, Inc.
  • 16. Select Dictionary Index structure s_idx[x] = select(512・x) i = 0, 1, 2, … Calculation Limit the range by using s_idx. Limit the range by using r_idx[x].abs. Limit the range by using r_idx[x].rel[y]. Find the i-th “1” in the range. 16 30 March 2013 Brazil, Inc.
  • 17. Select Operations s_idx s_idx 512 512 512 512 512 512 512 r_idx.abs r_idx.abs 64 64 64 64 64 64 64 64 r_idx.rel r_idx.rel 64 Final round 17 30 March 2013 Brazil, Inc.
  • 18. Select Final Round Binary search & table lookup Three-level branches if if if if if if if 8 8 8 8 8 8 8 8 Table lookup 18 30 March 2013 Brazil, Inc.
  • 19. How to remove the branches in the final round. Improvements 19 30 March 2013 Brazil, Inc.
  • 20. Original // x is the final 64-bit block (uint64_t). x = x – ((x >> 1) & MASK_55); x = (x & MASK_33) + ((x >> 2) & MASK_33); x = (x + (x >> 4)) & MASK_0F; x *= MASK_01; // Tricky popcount if (i < ((x >> 24) & 0xFF)) { // The first-level branch if (i < ((x >> 8) & 0xFF)) { // The second-level branch if (i < (x & 0xFF)) { // The third-level branch // The first byte contains the i-th “1”. } else { // The second byte contains the i-th “1”. 20 30 March 2013 Brazil, Inc.
  • 21. Tips – Tricky PopCount 0 1 1 1 0 0 1 0 x = x – ((x >> 1) & MASK_55); 1 2 0 1 x = (x & MASK_33) + ((x >> 2) & MASK_33); 3 1 x = (x + (x >> 4)) & MASK_0F; 4 21 30 March 2013 Brazil, Inc.
  • 22. Tips – Tricky PopCount // MASK_01 = 0x0101010101010101ULL; // x = x | (x << 8) | (x << 16) | (x << 24) | …; x *= MASK_01; 4 1 3 5 2 6 3 4 28 23 15 7 24 20 13 4 22 30 March 2013 Brazil, Inc.
  • 23. + SSE2 (After PopCount) // y[0 … 7] = i + 1; __m128i y = _mm_cvtsi64_si128((i + 1) * MASK_01); __m128i z = _mm_cvtsi64_si128(x); // Compare the 16 8-bit signed integers in y and z. // y[k] = (y[k] > z[k]) ? 0xFF : 0x00; y = _mm_cmpgt_epi8(y, z); // PCMPGTB // The j-th byte contains the i-th “1”. // TABLE is a 128-byte pre-computed table. uint8_t j = TABLE[_mm_movemask_epi8(y)]; 23 30 March 2013 Brazil, Inc.
  • 24. Tips – PCMPGTB y = _mm_cvtsi64_si128((i + 1) * MASK_01); 20 20 20 20 20 20 20 20 z = _mm_cvtsi64_si128(x); 28 24 23 20 15 13 7 4 // y[k] = (y[k] > z[k]) ? 0xFF : 0x00; y = _mm_cmpgt_epi8(y, z); 0x00 0x00 0x00 0x00 0xFF 0xFF 0xFF 0xFF 24 30 March 2013 Brazil, Inc.
  • 25. + Tricks (After Comparison) uint64_t j = _mm_cvtsi128_si64(y); // Calculation without TABLE j = ((j & MASK_01) * MASK_01) >> 56; // Calculation with BSR j = (63 – __builtin_clzll(j + 1)) / 8; // Calculation with popcnt (SSE4.2 or SSE4a) j = __builtin_popcountll(j) / 8; 25 30 March 2013 Brazil, Inc.
  • 26. – SSE2 (Simple and Fast) // x is the final 64-bit block (uint64_t). x = x – ((x >> 1) & MASK_55); x = (x & MASK_33) + ((x >> 2) & MASK_33); x = (x + (x >> 4)) & MASK_0F; x *= MASK_01; // Tricky popcount uint64_t y = (i + 1) * MASK_01; uint64_t z = x | MASK_80; // Compare the 8 7-bit unsigned integers in y and z. z = (z – y) & MASK_80; uint8_t j = __builtin_ctzll(z) / 8; 26 30 March 2013 Brazil, Inc.
  • 27. Tips – Comparison uint64_t y = (i + 1) * MASK_01; 0x14 0x14 0x14 0x14 0x14 0x14 0x14 0x14 uint64_t z = x | MASK_80; 0x9C 0x98 0x97 0x94 0x8F 0x8D 0x87 0x84 // Compare the 8 7-bit unsigned integers in y and z. z = (z – y) & MASK_80; 0x80 0x80 0x80 0x80 0x00 0x00 0x00 0x00 27 30 March 2013 Brazil, Inc.
  • 28. + SSSE3 (For PopCount) // Get lower nibbles and upper nibbles of x. __m128i lower = _mm_cvtsi64_si128(x & MASK_0F); __m128i upper = _mm_cvtsi64_si128(x & MASK_F0); upper = _mm_srli_epi32(upper, 4); // Use PSHUFB for counting “1”s in each nibble. __m128i table = _mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0); lower = _mm_shuffle_epi8(table, lower); upper = _mm_shuffle_epi8(table, upper); // Merge the counts to get the number of “1”s in each byte. x = _mm_cvtsi128_si64(_mm_add_epi8(lower, upper)); x *= MASK_01; 28 30 March 2013 Brazil, Inc.
  • 29. Tips – PSHUFB lower = _mm_cvtsi64_si128(x & MASK_0F); 12 8 7 4 15 13 7 4 table = _mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, …); 4 3 3 2 3 2 2 1 3 2 2 1 2 1 1 0 // Perform a parallel 16-way lookup. lower = _mm_shuffle_epi8(table, lower); 2 1 3 1 4 3 3 1 29 30 March 2013 Brazil, Inc.
  • 30. How effective the improvements are. Evaluation 30 30 March 2013 Brazil, Inc.
  • 31. Environment OS Mac OSX 10.8.3 (64-bit) CPU Core i7 3720QM – Ivy Bridge 2.6GHz – up to 3.6GHz Compiler Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn) 31 30 March 2013 Brazil, Inc.
  • 32. Data Source Japanese Wikipedia page titles gzip –cd jawiki-20130328-all-titles-in- ns0.gz | LC_ALL=C sort –R > data Details Number of keys: 1,367,750 Average length: 21.14 bytes Total length: 28,919,893 bytes 32 30 March 2013 Brazil, Inc.
  • 33. Binaries marisa 0.2.1 ./configure CXX=clang++ --enable-popcnt make tools/marisa-benchmark < data marisa 0.2.2 ./configure CXX=clang++ --enable-sse4 make tools/marisa-benchmark < data 33 30 March 2013 Brazil, Inc.
  • 34. Results – marisa 0.2.1 Without improvements #Tries Size Build Lookup Reverse Prefix Predict [KB] [Kqps] [Kqps] [Kqps] [Kqps] [Kqps] 1 11,811 724 1,105 1,223 1,038 711 2 8,639 632 790 877 753 453 3 8,001 621 750 816 708 406 4 7,788 591 723 791 687 391 5 7,701 590 712 781 680 384 Baseline 34 30 March 2013 Brazil, Inc.
  • 35. Results – marisa 0.2.2 With improvements #Tries Size Build Lookup Reverse Prefix Predict [KB] [Kqps] [Kqps] [Kqps] [Kqps] [Kqps] 1 11,811 757 1,198 1,359 1,115 772 2 8,639 657 873 1,000 820 503 3 8,001 621 817 924 770 453 4 7,788 613 797 900 752 438 5 7,701 610 787 884 737 427 Same size Faster operations 35 30 March 2013 Brazil, Inc.
  • 36. Results – Improvements Improvement ratios #Tries Size Build Lookup Reverse Prefix Predict [%] [%] [%] [%] [%] [%] 1 0.00 +4.56 +8.42 +11.12 +7.42 +8.58 2 0.00 +3.96 +10.52 +14.03 +8.90 +11.04 3 0.00 0.00 +8.93 +13.24 +8.76 +11.58 4 0.00 +3.72 +10.24 +13.78 +9.46 +12.02 5 0.00 +3.39 +10.53 +13.19 +8.38 +11.20 Same size Faster operations 36 30 March 2013 Brazil, Inc.
  • 37. Conclusion “Any sufficiently advanced technology is indistinguishable from magic.” “Any sufficiently advanced technique is indistinguishable from magic.” “You are magician.” 37 30 March 2013 Brazil, Inc.