FULL ENJOY Call Girls In Gautam Nagar (Delhi) Call Us 9953056974
X86opti 05 s5yata
1. Remove Branches in
BitVector Select Operations
- marisa 0.2.2 -
Susumu Yata
@s5yata
Brazil, Inc.
1
30 March 2013 Brazil, Inc.
2. Who I Am
Job
Brazil, Inc. (groonga developer)
We need R&D software engineers.
Personal research & development
Tries
darts-clone, marisa-trie, etc.
Corpus
Nihongo Web Corpus 2010 (NWC 2010)
2
30 March 2013 Brazil, Inc.
4. BitVector
What‟s BitVector?
A sequence of bits
Operations
BitVector::get(i)
BitVector::rank(i)
BitVector::select(i)
4
30 March 2013 Brazil, Inc.
5. BitVector – Get Operations
Interface
BitVector::get(i)
Description
The i-th bit (“0” or “1”)
0 1 2 … i–1 i i+1 … n-2 n-1
0 0 1 … 0 1 1 … 0 0
Get!
5
30 March 2013 Brazil, Inc.
6. BitVector – Rank Operations
Interface
BitVector::rank(i)
Description
The number of “1”s up to the i-th bit
0 1 2 … i–1 i i+1 … n-2 n-1
0 0 1 … 0 1 1 … 0 0
How many “1”s?
6
30 March 2013 Brazil, Inc.
7. BitVector – Select Operations
Interface
BitVector::select(i)
Description
The position of the i-th “1”
0 1 2 … … … … … n-2 n-1
0 0 1 … … … … … 0 0
Where is the i-th “1”?
7
30 March 2013 Brazil, Inc.
8. Marisa
Who‟s Marisa?
An ordinary human magician
What‟s Marisa?
A static and space-efficient dictionary
Data structure
Recursive LOUDS-based Patricia tries
Site
http://code.google.com/p/marisa-trie
8
30 March 2013 Brazil, Inc.
9. Marisa – Patricia
Patricia is a labeled tree.
Keys = Tree + Labels
Node Label
ID Key 1 “Ar”
4
0 “Argentina” 1 2 “Brazil”
1 “Armenia” 5 3 „C‟
0 2
2 “Brazil” 4 “gentina”
6
3 “Canada” 3 5 “menia”
4 “Cyprus” 7 6 “anada”
7 “yprus”
9
30 March 2013 Brazil, Inc.
10. Marisa – Recursiveness
Unfortunately, this margin is too small…
Keys = Tree + Labels
Labels = Tree + Labels
Labels = Tree + Labels <– Reasonable
Labels = Tree + Labels
Labels = Tree + Labels
Labels = Tree + Labels
Labels = Tree + Labels
…
10
30 March 2013 Brazil, Inc.
11. Marisa – BitVector Usage
LOUDS
Level-Order Unary Degree Sequence
Terminal flags
A node is terminal (“1”) or not (“0”).
Link flags
A node has a link to its multi-byte label
(“1”) or has a built-in single-byte label (“0”).
11
30 March 2013 Brazil, Inc.
16. Select Dictionary
Index structure
s_idx[x] = select(512・x)
i = 0, 1, 2, …
Calculation
Limit the range by using s_idx.
Limit the range by using r_idx[x].abs.
Limit the range by using r_idx[x].rel[y].
Find the i-th “1” in the range.
16
30 March 2013 Brazil, Inc.
18. Select Final Round
Binary search & table lookup
Three-level branches
if
if if
if if if if
8 8 8 8 8 8 8 8
Table lookup
18
30 March 2013 Brazil, Inc.
19. How to remove the branches in the final round.
Improvements
19
30 March 2013 Brazil, Inc.
20. Original
// x is the final 64-bit block (uint64_t).
x = x – ((x >> 1) & MASK_55);
x = (x & MASK_33) + ((x >> 2) & MASK_33);
x = (x + (x >> 4)) & MASK_0F;
x *= MASK_01; // Tricky popcount
if (i < ((x >> 24) & 0xFF)) { // The first-level branch
if (i < ((x >> 8) & 0xFF)) { // The second-level branch
if (i < (x & 0xFF)) { // The third-level branch
// The first byte contains the i-th “1”.
} else {
// The second byte contains the i-th “1”.
20
30 March 2013 Brazil, Inc.
21. Tips – Tricky PopCount
0 1 1 1 0 0 1 0
x = x – ((x >> 1) & MASK_55);
1 2 0 1
x = (x & MASK_33) + ((x >> 2) & MASK_33);
3 1
x = (x + (x >> 4)) & MASK_0F;
4
21
30 March 2013 Brazil, Inc.
23. + SSE2 (After PopCount)
// y[0 … 7] = i + 1;
__m128i y = _mm_cvtsi64_si128((i + 1) * MASK_01);
__m128i z = _mm_cvtsi64_si128(x);
// Compare the 16 8-bit signed integers in y and z.
// y[k] = (y[k] > z[k]) ? 0xFF : 0x00;
y = _mm_cmpgt_epi8(y, z); // PCMPGTB
// The j-th byte contains the i-th “1”.
// TABLE is a 128-byte pre-computed table.
uint8_t j = TABLE[_mm_movemask_epi8(y)];
23
30 March 2013 Brazil, Inc.
25. + Tricks (After Comparison)
uint64_t j = _mm_cvtsi128_si64(y);
// Calculation without TABLE
j = ((j & MASK_01) * MASK_01) >> 56;
// Calculation with BSR
j = (63 – __builtin_clzll(j + 1)) / 8;
// Calculation with popcnt (SSE4.2 or SSE4a)
j = __builtin_popcountll(j) / 8;
25
30 March 2013 Brazil, Inc.
26. – SSE2 (Simple and Fast)
// x is the final 64-bit block (uint64_t).
x = x – ((x >> 1) & MASK_55);
x = (x & MASK_33) + ((x >> 2) & MASK_33);
x = (x + (x >> 4)) & MASK_0F;
x *= MASK_01; // Tricky popcount
uint64_t y = (i + 1) * MASK_01;
uint64_t z = x | MASK_80;
// Compare the 8 7-bit unsigned integers in y and z.
z = (z – y) & MASK_80;
uint8_t j = __builtin_ctzll(z) / 8;
26
30 March 2013 Brazil, Inc.
27. Tips – Comparison
uint64_t y = (i + 1) * MASK_01;
0x14 0x14 0x14 0x14 0x14 0x14 0x14 0x14
uint64_t z = x | MASK_80;
0x9C 0x98 0x97 0x94 0x8F 0x8D 0x87 0x84
// Compare the 8 7-bit unsigned integers in y and z.
z = (z – y) & MASK_80;
0x80 0x80 0x80 0x80 0x00 0x00 0x00 0x00
27
30 March 2013 Brazil, Inc.
28. + SSSE3 (For PopCount)
// Get lower nibbles and upper nibbles of x.
__m128i lower = _mm_cvtsi64_si128(x & MASK_0F);
__m128i upper = _mm_cvtsi64_si128(x & MASK_F0);
upper = _mm_srli_epi32(upper, 4);
// Use PSHUFB for counting “1”s in each nibble.
__m128i table =
_mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0);
lower = _mm_shuffle_epi8(table, lower);
upper = _mm_shuffle_epi8(table, upper);
// Merge the counts to get the number of “1”s in each byte.
x = _mm_cvtsi128_si64(_mm_add_epi8(lower, upper));
x *= MASK_01;
28
30 March 2013 Brazil, Inc.
37. Conclusion
“Any sufficiently advanced technology
is indistinguishable from magic.”
“Any sufficiently advanced technique
is indistinguishable from magic.”
“You are magician.”
37
30 March 2013 Brazil, Inc.