Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PGroongaMake PostgreS...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PostgreSQL and me
Pos...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Patches
パッチ
#13840: p...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PGroonga dev style
PG...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PostgreSQL and FTS
Po...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Because...
理由は…
Our a...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PG FTS problem
Postgr...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
FTS for Japanese1
日本語...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
FTS for Japanese2
日本語...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Existing solution
既存の...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
pg_bigm
An extension
...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
pg_bigm: Usage
pg_big...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
pg_bigm: Demerit
pg_b...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
"recheck"
"Exact" seq...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Benchmark
ベンチマーク
0
0....
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
New solution
新しい解決策
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PGroonga
Pronunciatio...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PGroonga layer
GIN
te...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Benchmark
ベンチマーク
0
0....
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Wrap up1
まとめ1
Postgre...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Wrap up2
まとめ2
Many hi...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Why is PGroonga fast?...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Benchmark
ベンチマーク
0
0....
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Why is pg_bigm fast?
...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
N-gram and "recheck"
...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Phrase search
フレーズ検索
...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
N-gram and phrase sea...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
N-gram and GIN: Creat...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
N-gram and GIN: Searc...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
GIN and phrase search...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Why is PGroonga fast?...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Why no "recheck"?
どうし...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Full inverted index
完...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Inverted index diff
転...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
N-gram/PGroonga: Sear...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Wrap up
まとめ
N-gram ne...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
FTS and English(*)
全文...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PGroonga and English
...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PGroonga: Search
PGro...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PGroonga's N-gram
Var...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Wrap up1
まとめ1
PGroong...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Wrap up2
まとめ2
PGroong...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
More about PGroonga
P...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Performance
性能
Search...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Search and update
検索と...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Characteristics
傾向
Se...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Update and lock
更新とロッ...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
GIN: Read/Write
GIN:読...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
PGroonga: Read/Write
...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Fast stably
安定して速い
GI...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Index only scan
インデック...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
More faster search
より...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Direct Groonga search...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Index creation time
イ...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Performance: Wrap up
...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Japanese specific fea...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Completion: Table
入力補...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Completion: Data
入力補完...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Completion: Index
入力補...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Completion: Search
入力...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Completion: Result
入力...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
For Japanese: Wrap up...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
JSON support
JSONサポート...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
JSON: FTS: Data
JSON:...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
JSON: FTS: Index
JSON...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
JSON: FTS: GIN
JSON:全...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
JSON: FTS: PGroonga
J...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
JSON: Wrap up
JSON:まと...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Replication
レプリケーション
...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Implementation
実装
Mas...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Overview
概要
Master Po...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Action log: "action"
...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Action log: INSERT
アク...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Action log: Logs
アクショ...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Write action logs
アクシ...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Apply action logs
アクシ...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Action log: Why msgpa...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Replication: Wrap up
...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Wrap up1
まとめ1
Postgre...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Wrap up2
まとめ2
PGroong...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Wrap up3
まとめ3
PGroong...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
Wrap up4
まとめ4
PGroong...
PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0
See also
https://pgro...
Prochain SlideShare
Chargement dans…5
×

PGroonga – Make PostgreSQL fast full text search platform for all languages!

PostgreSQL has built-in full text search feature. But it supports only limited languages. For example, it doesn't support Japanese. pg_trgm bundled in PostgreSQL supports all languages including Japanese. But it has performance problems for large documents.

This talk describes about PGroonga that resolves these problems.

  • Soyez le premier à commenter

PGroonga – Make PostgreSQL fast full text search platform for all languages!

  1. 1. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PGroongaMake PostgreSQL fast full text search platform for all languages! Kouhei Sutou ClearCode Inc. PGConf.ASIA 2016 2016-12-03
  2. 2. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PostgreSQL and me PostgreSQLと私 Some my patches are mergedいくつかパッチがマージされている
  3. 3. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Patches パッチ #13840: pg_dump generates unloadable SQL pg_dumpがリストアできないSQLを出力する #14160: DROP ACCESS METHOD IF EXISTS isn't impl. DROP ACCESS METHOD IF EXISTSが実装されていない They are found while developing PGroonga どちらもPGroonga開発中に見つけた問題
  4. 4. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PGroonga dev style PGroongaの開発スタイル When there are problems in related projects including PostgreSQL PostgreSQLを含む関連プロジェクトに問題があった場合 We fix these problems in these projects instead of choosing workaround in PGroonga PGroonga側で回避するのではなく 関連プロジェクトの方で問題を直す
  5. 5. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PostgreSQL and FTS PostgreSQLと全文検索 PostgreSQL has built-in full text search feature PostgreSQLには組込の全文検索機能がある It has some problems... ただ、いくつか問題がある We fixed them by PGroonga PGroongaを開発することでそれらの問題を修正した instead of fixing PostgreSQL 😓 PostgreSQLを修正するのではなくて…
  6. 6. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Because... 理由は… Our approach is different from PostgreSQL's approach PGroongaのやり方はPostgreSQLのやり方と違う 1. PG provides plugin system PostgreSQLはプラグインの仕組みを提供している Implementing as a plugin is PostgreSQL way! プラグインでの実装はPostgreSQLらしいやり方! 2.
  7. 7. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PG FTS problem PostgreSQLの全文検索の問題 Many langs aren't supported サポートしていない言語がたくさんある e.g.: Asian languages 例:アジアの言語 Japanese, Chinese and more 日本語や中国語など
  8. 8. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 FTS for Japanese1 日本語の全文検索1 SELECT to_tsvector('japanese', 'こんにちは'); -- ERROR: text search configuration -- "japanese" does not exist -- LINE 2: to_tsvector('japanese', -- ^
  9. 9. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 FTS for Japanese2 日本語の全文検索2 CREATE EXTENSION pg_trgm; SELECT show_trgm('こんにちは'); -- show_trgm -- ----------- -- {} ← Must not empty! -- (1 row)
  10. 10. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Existing solution 既存の解決策 pg_bigm
  11. 11. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 pg_bigm An extension 拡張機能 Similar to pg_trgm pg_trgmと似ている Operator class for GIN GIN用の演算子クラス
  12. 12. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 pg_bigm: Usage pg_bigm:使い方 CREATE INDEX index ON table USING GIN (column gin_bigm_ops); -- ↑Use GIN ↑Specify op class
  13. 13. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 pg_bigm: Demerit pg_bigm:デメリット Slow for large document 文書が長いと遅い (Normally, we want to use FTS for large document) (普通は長い文書に対して全文検索したい) Because it needs "recheck" 「recheck」が必要だから
  14. 14. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 "recheck" "Exact" seq. search after "loose" index search 「ゆるい」インデックス検索の後に実行する 「正確な」シーケンシャルサーチ The larger text, the slower 対象テキストが大きければ大きいほど遅くなる text = doc size * N docs 対象テキスト = 文書サイズ * 文書数
  15. 15. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Benchmark ベンチマーク 0 0.5 1 1.5 2 2.5 3 311 14706 20389 Data: Japanese Wikipedia (Many records and large documents) N records: About 0.9millions Average text size: 6.7KiB Slow Slow Elapsedtime(sec) (Lowerisbetter) N hits pg_bigm
  16. 16. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 New solution 新しい解決策
  17. 17. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PGroonga Pronunciation: píːzí:lúnɡά 読み方:ぴーじーるんが An extension 拡張機能 Index and operator classes インデックスと演算子クラス Not operator classes for GIN GINの演算子クラスではない
  18. 18. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PGroonga layer GIN textsearch pg_trgm pg_bigm Index Operator class PGroonga PGroonga
  19. 19. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Benchmark ベンチマーク 0 0.5 1 1.5 2 2.5 3 311 14706 20389 Data: Japanese Wikipedia (Many records and large documents) N records: About 0.9millions Average text size: 6.7KiB Fast Fast Elapsedtime(sec) (Lowerisbetter) N hits PGroonga pg_bigm
  20. 20. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Wrap up1 まとめ1 PostgreSQL doesn't support Asian languages PostgreSQLはアジアの言語をサポートしていない pg_bigm and PGroonga support all languages pg_bigmとPGroongaはすべての言語をサポートしている
  21. 21. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Wrap up2 まとめ2 Many hits case: ヒット数が多い場合 pg_bigm is slow pg_bigmは遅い PGroonga is fast PGroongaは速い
  22. 22. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Why is PGroonga fast? PGroongaはどうして速いのか Doesn't need "recheck" 「recheck」が必要ないから Is "recheck" really slow? 本当に「recheck」が遅いの? See one more benchmark result もう一つベンチマーク結果を見てみましょう
  23. 23. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Benchmark ベンチマーク 0 0.5 1 1.5 2 2.5 3 0 100000 200000 300000 400000 500000 Data: Japanese Wikipedia (Many records and large documents) N records: About 0.9millions Average text size: 6.7KiB Slow Slow Fast for many hits! Query: "日本" Elapsedtime(sec) (Lowerisbetter) N hits PGroonga pg_bigm
  24. 24. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Why is pg_bigm fast? pg_bigmはどうして速いのか Query is "日本" クエリーは「日本」 Point: 2 characters ポイント:2文字 pg_bigm doesn't need "recheck" for 2 chars query pg_bigmは2文字のクエリーに「recheck」の必要がない It means that "recheck" is slow つまり「recheck」が遅いということ
  25. 25. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 N-gram and "recheck" N-gramと「recheck」 N-gram approach needs "phrase search" when query has N+1 or more characters N+1文字以上のクエリーには「フレーズ検索」が必要 N=2 for pg_bigm, N=3 for pg_trgm pg_bigmはN=2でpg_trgmはN=3 GIN needs "recheck" for "phrase search" GINは「フレーズ検索」には「recheck」が必要
  26. 26. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Phrase search フレーズ検索 Phrase search is "token search" and "position check" フレーズ検索は「トークン検索」と「位置チェック」 Tokens must exist and be ordered トークンは同じ順序で出現していないといけない OK: "car at" for "car at" query NG: "at car" for "car at" query
  27. 27. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 N-gram and phrase search Split text to tokens テキストをトークンに分割 "cat"→"ca","at" 1. Search all tokens すべてのトークンを検索 "ca" & "at" exist: Candidate! 2. Check appearance pos. 出現位置をチェック "ca" then "at": Found! 3.
  28. 28. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 N-gram and GIN: Create N-gramとGIN:作成 GIN "ca","at" Tokenize Documents cat at car 10 20 ID Text "ca" "at" "t " Token Posting list 10,20 10,20 20 " c" "ar" 20 20 "at","t "," c","ca","ar"
  29. 29. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 N-gram and GIN: Search N-gramとGIN:検索 "ca" "at" "t " Token Posting list GIN 10,20 10,20 20 cat Query "ca","at" Tokenize AND cat at car 10 20 Documents ID Text 10,20 Candidates" c" "ar" 20 20 Search Appearance position check (Point: Out of GIN)
  30. 30. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 GIN and phrase search GINとフレーズ検索 Phrase search needs position check フレーズ検索では出現位置チェックが必要 GIN doesn't support position check GINは出現位置チェックをサポートしていない →GIN needs "recheck"→Slow! GINでは「recheck」が必要だから遅い
  31. 31. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Why is PGroonga fast? PGroongaはどうして速いのか PGroonga uses N-gram by default PGroongaはデフォルトでN-gramを使っている But doesn't need "recheck" PGroongaは「recheck」の必要がない
  32. 32. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Why no "recheck"? どうして「recheck」が必要ないのか PGroonga uses full inverted indexPGroongaは完全転置インデックスを使っているから
  33. 33. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Full inverted index 完全転置インデックス Including position位置情報を含む
  34. 34. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Inverted index diff 転置インデックスの違い cat at car 10 20 Documents Full: Doc ID + pos "ca" "at" "t " ID Text Token Posting list 20:2 10:2,20:1 10:1,20:4 "ca","at" 1 2 "at","t "," c","ca","ar" 1 2 3 4 5 Tokenize Not full: Only doc ID " c" "ar" 20:3 20:5 "ca" "at" "t " Token Posting list 20 10,20 10,20 " c" "ar" 20 20
  35. 35. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 N-gram/PGroonga: Search N-gramとPGroonga:検索 "ca" "at" "t " Token Posting list PGroonga 10:1,20:4 10:2,20:1 20:2 cat Query Tokenize AND cat at car 10 20 Documents ID Text 10 Result " c" "ar" 20:3 20:5 Search Appearance position check (Point: In PGroonga) "ca","at"
  36. 36. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Wrap up まとめ N-gram needs phrase search N-gramの場合はフレーズ検索が必要 Full inverted index provides fast phrase search 完全転置インデックスを使うと高速にフレーズ検索でき る GIN isn't full inverted index GINは完全転置インデックスではない PGroonga uses full inverted index PGroongaは完全転置インデックスを使っている
  37. 37. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 FTS and English(*) 全文検索と英語 Normally, N-gram isn't used for English FTS 普通は英語の全文検索にN-gramを使わない N-gram is slower than word based approach (textsearch approach) N-gramは単語ベースのやり方(textsearchのやり方) より遅め Stemming/stop word can't be used N-gramではステミングとストップワードを使えない (*) English≒Alphabet based languages
  38. 38. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PGroonga and English PGroongaと英語 PGroonga uses N-gram by default PGroongaはデフォルトではN-gramを使う Is PGroonga slow for English? ではPGroongaは英語では遅いのか? No. Similar to textsearch 遅くない。textsearchと同じくらい
  39. 39. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PGroonga: Search PGroonga:検索 0 0.2 0.4 0.6 0.8 1 1.2 1.4 PostgreSQL OR MySQL database America Data: English Wikipedia (Many records and large docs) N records: About 5.3millions Average text size: 6.4KiB Elapsedtime(ms) (Shorterisbetter) Query PGroonga textsearch
  40. 40. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PGroonga's N-gram Variable size N-gram 可変長サイズのN-gram Continuous alphabets are 1 token (= word based approach) 連続したアルファベットは1トークン(=単語ベース) Hello→"Hello" not "He","el",… No alphabet is 2-gram 非アルファベットは2-gram こんにちは→"こん","んに",…
  41. 41. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Wrap up1 まとめ1 PGroonga's search is fast for all languages PGroongaの検索はすべての言語で速い Including alphabet based languages and Asian languages mixed case アルファベットベースの言語とアジアの言語が混ざっ た場合でも速い (textsearch doesn't support mixed case) (textsearchは言語を混ぜることはできない)
  42. 42. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Wrap up2 まとめ2 PGroonga makes PostgreSQL fast full text search platform for all languages! PGroongaでPostgreSQLが 全言語対応高速全文検索プラットフォームになる!
  43. 43. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 More about PGroonga PGroongaについてもっと Performance 性能 Japanese specific feature 日本語向けの機能 JSON support JSONサポート Replication レプリケーション
  44. 44. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Performance 性能 Search and update 検索と更新 Index only scan インデックスオンリースキャン Direct Groonga search 直接Groongaで検索 Index creation インデックス作成
  45. 45. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Search and update 検索と更新 Doesn't decrease search performance while updating 更新中も検索性能が落ちない It's good characteristics for chat application チャットアプリでうれしい傾向 Zulip supports PGroonga Zulip: OSS chat app by Dropbox ZulipはPGroongaをサポートしている ZulipはDropboxが開発しているOSSのチャットアプリ
  46. 46. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Characteristics 傾向 Searchthroughput Update throughput PGroonga Searchthroughput Update throughput GIN Keep search performance while many updates Decrease search performance while updating
  47. 47. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Update and lock 更新とロック Update without read locks 参照ロックなしで更新 Write locks are required 書き込みロックは必要
  48. 48. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 GIN: Read/Write GIN:読み書き Conn1 Conn2 INSERT start SELECT start Blocked INSERT finish SELECT finish GIN Slow down!
  49. 49. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 PGroonga: Read/Write PGroonga:読み書き Conn1 Conn2 INSERT start SELECT start INSERT finish SELECT finish PGroonga No slow down!
  50. 50. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Fast stably 安定して速い GIN has intermittent performance decrements GINは間欠的な性能劣化がある For details:🔎"GIN pending list" 詳細は「GIN pending list」で検索 PGroonga keeps fast search PGroongaは高速な検索を維持 PGroonga keeps index updated PGroongaのインデックスは常に最新状態
  51. 51. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Index only scan インデックスオンリースキャン GIN: Not supported GIN:未サポート PGroonga: Supported PGroonga:サポート
  52. 52. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 More faster search より高速な検索 Direct Groonga search is more faster 直接Groongaで検索するとさらに高速 Groonga: Full text search engine PGroonga uses Groonga:PGroongaが使っている全文検索エンジン
  53. 53. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Direct Groonga search 直接Groongaで検索 0 0.2 0.4 0.6 0.8 1 1.2 1.4 PostgreSQL OR MySQL database America Data: English Wikipedia (Many records and large docs) N records: About 5.3millions Average text size: 6.4KiB Groonga is 30x faster than others Elapsedtime(ms) (Shorterisbetter) Query PGroonga Groonga textsearch
  54. 54. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Index creation time インデックス作成時間 0 0.5 1 1.5 2 2.5 3 Data: English Wikipedia Size: About 33GiB Max text size: 1MiB 2x faster than textsearch Elapsedtime(hour) (Shorterisbetter) Module Index creation PGroonga textsearch pg_trgm
  55. 55. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Performance: Wrap up 性能:まとめ Keep fast search w/ update 更新しながらでも高速検索を維持 Support index only scan インデックスオンリースキャンをサポート Direct Groonga search is more faster Groonga直接検索はもっと速い Fast index creation インデックス作成も速い
  56. 56. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Japanese specific feature 日本語向けの機能 Completion by Romaji ローマ字による入力補完
  57. 57. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Completion: Table 入力補完:テーブル CREATE TABLE stations ( name text, readings text[] -- ↑Support N readings );
  58. 58. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Completion: Data 入力補完:データ INSERT INTO stations VALUES ('Tokyo', ARRAY['トウキョウ']), -- ↑In Katakana -- (...), ('Akihabara', ARRAY['アキハバラ', 'アキバ']);
  59. 59. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Completion: Index 入力補完:インデックス CREATE INDEX pgroonga_index ON stations USING pgroonga ( -- ↓For prefix and prefix Romaji/Katakana search name pgroonga.text_term_search_ops_v2, -- ↓For prefix and prefix Romaji/Katakana search -- against array readings pgroonga.text_array_term_search_ops_v2);
  60. 60. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Completion: Search 入力補完:検索 SELECT name, readings FROM stations WHERE name &^ 'tou' OR -- ↑Prefix search readings &^~> 'tou' -- ↑Prefix Romaji/Katakana search ORDER BY name LIMIT 10;
  61. 61. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Completion: Result 入力補完:結果 Hit by prefix Romaji/Katakana search "tou"(Romaji)→"トウ"(Katakana) 前方一致RK検索でヒット name | readings -------+-------------- Tokyo | {トウキョウ} (1 row)
  62. 62. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 For Japanese: Wrap up 日本語向け機能:まとめ Support prefix Romaji/Kana search 前方一致RK検索をサポート Useful for implementing auto complete feature in search box 検索欄にオートコンプリート機能を実装する時に便利 Users don't need to convert Romaji to Kanji ユーザーはローマ字を漢字に変換する必要がない
  63. 63. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 JSON support JSONサポート Support full text search 全文検索対応 Target: All texts in JSON JSON内のすべてのテキスト Not only a text in a path 特定のパスのテキストだけではない (GIN supports only this style) (GINはこのやり方だけサポート)
  64. 64. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 JSON: FTS: Data JSON:全文検索:Data CREATE TABLE logs ( record jsonb ); INSERT INTO logs (record) VALUES ('{"host": "app1"}'), ('{"message": "app is down"}');
  65. 65. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 JSON: FTS: Index JSON:全文検索:インデックス CREATE INDEX message_index ON logs USING GIN ((record->>'message') gin_trgm_ops); -- {"message": "HERE IS ONLY SEARCHABLE"} CREATE INDEX record_index ON logs USING pgroonga (record); -- All string values are searchable
  66. 66. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 JSON: FTS: GIN JSON:全文検索:GIN SELECT * FROM logs WHERE record->>'message' LIKE '%app%'; -- ↑ {"host": "app1"} isn't target -- record -- ---------------------------- -- {"message": "app is down"} -- (1 row)
  67. 67. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 JSON: FTS: PGroonga JSON:全文検索:PGroonga SELECT * FROM logs WHERE record @@ 'string @ "app"'; -- ↑ All string values are target -- record -- ---------------------------- -- {"host": "app1"} -- {"message": "app is down"} -- (2 rows)
  68. 68. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 JSON: Wrap up JSON:まとめ Support full text search against all texts in JSON JSON内の全テキスト対象の全文検索をサポート
  69. 69. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Replication レプリケーション Support with PG 9.6! PostgreSQL 9.6で使う場合はサポート! PostgreSQL 9.6 ships "generic WAL" PostgreSQL 9.6で「generic WAL」機能が追加 Third party index can support WAL generation サードパーティーのインデックスもWALを生成できる
  70. 70. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Implementation 実装 Master: Encode action logs as MessagePack マスター:アクションログをMessagePack形式に変換 1. Master: Write the action logs to WAL マスター:アクションログをWALに書き込み 2. Slaves: Read the action logs and apply them スレーブ:アクションログを読み込んで適用 3.
  71. 71. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Overview 概要 Master PostgreSQL Index file PGroonga DB INSERT PGroonga Update Append action logs via generic WAL API Action log Slave Apply pending action logs on SELECT SELECT WAL
  72. 72. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Action log: "action" アクションログ:「アクション」 { "_action": ACTION_ID } # ACTION_ID: 0: INSERT # ACTION_ID: 1: CREATE_TABLE # ACTION_ID: 2: CREATE_COLUMN # ACTION_ID: 3: SET_SOURCES
  73. 73. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Action log: INSERT アクションログ:INSERT { "_action": 0, "_table": "TABLE_NAME", "ctid": PACKED_CTID_VALUE, "column1": COLUMN1_VALUE, ... }
  74. 74. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Action log: Logs アクションログ:複数ログ {"_action": ACTION_ID, ...} {"_action": ACTION_ID, ...} {"_action": ACTION_ID, ...} ...
  75. 75. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Write action logs アクションログの書き込み Index file Page Header Action logs
  76. 76. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Apply action logs アクションログの適用 Index file PGroonga DB Applied offset (Block#+Offset) (2,10) 1 2 3 4 Apply (2,50) AfterBefore 10 50
  77. 77. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Action log: Why msgpack? アクションログ:どうしてmsgpack? Because MessagePack supports streaming unpack MessagePackはストリーミングで展開できるから It's useful to stop applying action logs when WAL is applied partially on slaves スレーブでWALが途中までしか書き込まれていないと きにアクションログの適用を中断できるので便利
  78. 78. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Replication: Wrap up レプリケーション:まとめ Support with PG 9.6! PostgreSQL 9.6で使う場合はサポート! Concept: Action logs on WAL コンセプト:WAL上にアクションログを書く It'll be an useful pattern for out of PostgreSQL storage index PostgreSQL管理外のストレージを使うインデックスで はこのパターンが使えるはず
  79. 79. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Wrap up1 まとめ1 PostgreSQL doesn't support FTS for all languages PostgreSQLの全文検索は一部の言語のみ対応 PGroonga supports FTS for all languages PGroongaの全文検索は全言語対応
  80. 80. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Wrap up2 まとめ2 PGroonga is fast stably PGroongaは安定して速い PGroonga supports FTS for all texts in JSON PGroongaはJSON中の全テキストに対する全文検索に対応
  81. 81. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Wrap up3 まとめ3 PGroonga supports replication PGroongaはレプリケーション対応 PostgreSQL 9.6 is required ただしPostgreSQL 9.6が必要
  82. 82. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 Wrap up4 まとめ4 PGroonga makes PostgreSQL fast full text search platform for all languages! PGroongaでPostgreSQLが 全言語対応高速全文検索プラットフォームになる!
  83. 83. PGroonga - Make PostgreSQL fast full text search platform for all languages! Powered by Rabbit 2.2.0 See also https://pgroonga.github.io/ Tutorial: /tutorial/ Install: /install/ Reference: /reference/ Includes replication doc and benchmark docs Community: /community/

×