Solr6 の紹介(第18回 Solr勉強会資料) (2016年6月10日)

SOLR 6 の紹介2016年6月10日
第18回 Solr 勉強会

自己紹介
➤ 株式会社ロンウイット
➤ 西潟一生
➤ コンサルタント
➤ Apache Solr, Apache ManifoldCF
➤ コンサルティング
➤ 技術サポート
➤ トレーニング講師
などに従事

アジェンダ
➤ Solr 5 からの変更点について
➤ サポートされる Java のバージョン
➤ インデックスの互換性
➤ スキーマの変更方法
➤ スコア計算方法
➤ レプリカ&シャードの削除コマンドの
仕様変更
➤ facet.date.* の仕様変更
➤ Solr 6 の新機能について
➤ Parallel SQL
➤ Streaming Expressions
➤ Cross Data Center Replication
➤ Graph Query Parser
など

Java 8 is required
➤ Java 8 以上が必須
➤ SolrJ クライアントライブラリも含む

Index Format Changes
➤ Solr 4系以前のインデックスとは非互換
➤ 4系のインデックスを利用したい場合は Solr 5.5 に含まれる
Lucene IndexUpgrader を使用すること
➤ Solr 6 から Solr 4系インデックスを直接読めるようになるかも？
➤ https://issues.apache.org/jira/browse/SOLR-9051

Managed Schema is now the Default
➤ Managed Schema がデフォルト
➤ schema.xml は使わない．スキーマ設定は Schema API を使う．
➤ 従来通り schema.xml を使う場合は solrconfig.xml に以下を記述
<schemaFactory class=“ClassicIndexSchemaFactory”/>
➤ schema.xml から Managed Schema への移行は簡単
➤ conf 内の managed-schema ファイルを削除し，作成済み schema.xml を conf に配置後，Solr 起動
➤ 以下の記述が入った managed-schema ファイルが新たに作成され，配置した schema.xml は schema.xml.bak
にリネームされる

Managed Schema is now the Default (Example)
➤ 追加
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field-type":{
"name":"myNewTxtField",
"class":"solr.TextField",
"positionIncrementGap":"100",
"analyzer":{
"charFilters":[{
"class":"solr.PatternReplaceCharFilterFactory",
"replacement":"$1$1",
"pattern":"([a-zA-Z])1+" }],
"tokenizer":{
"class":"solr.WhitespaceTokenizerFactory" },
"filters":[{
"class":"solr.WordDelimiterFilterFactory",
"preserveOriginal":"0" }]}},
"add-field" : {
"name":"sell-by",
"type":"myNewTxtField",
"stored":true }
}' http://localhost:8983/solr/gettingstarted/schema
➤ 削除
curl -X POST -H 'Content-type:application/json' --data-binary '{
"delete-field-type":{ "name":"myNewTxtField" }
}' http://localhost:8983/solr/gettingstarted/schema

Default Similarity Changes
➤ デフォルトのスコアの計算方法が TF/IDF から Okapi BM25 に変更
➤ 検索結果のランキング精度が改善
➤ 参考資料
➤ https://www.elastic.co/blog/found-bm-vs-lucene-default-similarity

Replica & Shard Delete Command Changes
➤ “DELETESHARD”, “DELETEREPLICA” コマンドで，以下のディレクトリがデフォルトで削除
➤ Instance ディレクトリ
➤ Data ディレクトリ
➤ Index ディレクトリ
➤ 削除したくない場合は以下のようなパラメータを false にする
➤ deleteInstanceDir
➤ deleteDataDir
➤ deleteIndex
➤ 例
http://localhost:8983/solr/admin/collections?action=DELETEREPLICA&collection=test2&shard=shard2&replica=core_node3&deleteInstanceDir=false

facet.data.* Parameters Removed
➤ Solr 3系で deprecated となった facet.date パラメータが完全に削除
➤ facet.range で代用可

Doc Values
➤ 非テキスト系フィールドで，DocValues がデフォルトで有効
➤ メモリ節約，ディスクサイズ増
➤ 後述する Parallel SQL を使う時は DocValues を有効にしておくこと
➤ 参考資料
➤ http://blog.johtani.info/blog/2014/10/02/elasticsearch-1-4-0-beta-released-ja/
➤ https://lucidworks.com/blog/2013/04/02/fun-with-docvalues-in-solr-4-2/

Parallel SQL
➤ Solr で SQL が使用可能に
➤ 現在は Solr Cloud でのみ使用可

Example
➤ HTTP
curl --data-urlencode 'stmt=SELECT fieldA, count(*) FROM collection1 GROUP BY fieldA ORDER BY count(*) DESC LIMIT 10'
http://localhost:8983/solr/collection1/sql?aggregationMode=facet
➤ JDBC
Connection con = null;
try {
con = DriverManager.getConnection("jdbc:solr://" + zkHost + "?collection=collection1&aggregationMode=facet");
stmt = con.createStatement();
rs = stmt.executeQuery("SELECT fieldA, count(*) FROM collection1 GROUP BY fieldA ORDER BY count(*) DESC LIMIT 10");
while(rs.next()) {
String a_s = rs.getString("fieldA");
}
} finally {
rs.close();
stmt.close();
con.close();
}

Parallel SQL Specs
➤ テーブル名＝コレクション名
➤ 大小文字無視 (case insensitive)
➤ サポートされる句
➤ WHERE
➤ ORDER BY
➤ LIMIT
➤ DISTINCT
➤ GROUP BY
➤ WHERE 句は Solr のシンタックス適用可
➤ OR 検索
WHERE fieldA = ‘term1 term2’ → term1 OR term2 ※ デフォルトオペレーターが OR の場合
➤ 範囲検索
WHERE fieldB = ‘[0 TO 100]’
➤ JDBC Driver または HTTP でリクエスト可

Limitations, etc
➤ Solr Cloud のみで使用可
➤ delete, insert, update 非対応
➤ select されるフィールドは docValues=true 必須
➤ フィールドの異なり数が多い場合は aggregationMode=map_reduce が高速
そうでない場合は aggregationMode=facet が高速
➤ map_reduce を指定する例
curl --data-urlencode 'stmt=SELECT fieldA FROM collection1 GROUP BY fieldA LIMIT 10'
http://localhost:8983/solr/collection1/sql?aggregationMode=map_reduce

Streaming Expressions
➤ 並列実行されたタスクが結合可能
➤ 現在は Solr Cloud でのみ使用可
➤ まだ experimental
➤ Source Stream
➤ search
➤ jdbc
➤ facet
➤ stats
➤ topic
➤ Stream Decorators
➤ complement
➤ daemon
➤ innerJoin
➤ intersect
➤ hashJoin
➤ merge
➤ leftOuterJoin
➤ outerHashJoin
➤ parallel
➤ reduce
➤ rollup
➤ select
➤ top
➤ unique
➤ update

Streaming Expressions(Example)
➤ 異なるコレクションの検索結果マージ例 (exampleDocs の books.json と hd.xml がインデキシング済み)
curl --data-urlencode 'expr=
merge
(search(gettingstarted,q="*:*",fl="id,name",sort="id asc",qt="/export"),
search(gettingstarted2,q="*:*",fl="id,name",sort="id asc",qt="/export"),
on="id asc")
' ‘localhost:8983/solr/gettingstarted/stream’
…
{"result-set":{"docs":[
{"name":["Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300"],"id":"6H500F0"},
{"name":["The Lightning Thief"],"id":"978-0641723445"},
{"name":["The Sea of Monsters"],"id":"978-1423103349"},
{"name":["Sophie's World : The Greek Philosophers"],"id":"978-1857995879"},
{"name":["Lucene in Action, Second Edition"],"id":"978-1933988177"},
{"name":["Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133"],"id":"SP2514N"},
{"EOF":true,"RESPONSE_TIME":17}]}}

Cross Data Center Replication
➤ データセンターを跨いだレプリケーションをサポート
➤ まだ experimental
➤ active/passive モードで動作
➤ レプリケーションは，「元」から「先」への一方
通行
➤ 「先」が変更されても「元」への反映はない
➤ 「先」は結果整合性をサポート

Graph Query Parser
➤ Solr のドキュメントの関係性をツリー構造で表現でき，検索が可能に
➤ 以下のようなユースケースが考えられる
➤ アクセスコントロール
➤ ドキュメントに紐付いているユーザーをトラバース
➤ シソーラス辞書の構築
➤ 後述

Graph Query Parser(Example)
➤ 登録
curl -H 'Content-Type: application/json' 'http://localhost:8983/solr/my_graph/update?commit=true' --data-
binary '[
{"id":"A","foo": 7, "out_edge":["1","9"], "in_edge":["4","2"] },
{"id":"B","foo": 12, "out_edge":["3","6"], "in_edge":["1"] },
{"id":"C","foo": 10, "out_edge":["5","9"], "in_edge":["2"] },
{"id":"D","foo": 20, "out_edge":["4","7"], "in_edge":["3","5"] },
{"id":"E","foo": 17, "out_edge":[], "in_edge":["6"] },
{"id":"F","foo": 11, "out_edge":[], "in_edge":["7"] },
{"id":"G","foo": 7, "out_edge":["8"], "in_edge":[] },
{"id":"H","foo": 10, "out_edge":[], "in_edge":["8"] }
]’
➤ 検索
http://localhost:8983/solr/my_graph/query?fl=id&q={!graph+from=in_edge+to=out_edge}id:A
...
"response":{"numFound":6,"start":0,"docs":[
{ "id":"A" },
{ "id":"B" },
{ "id":"C" },
{ "id":"D" },
{ "id":"E" },
{ "id":"F" } ]
}

Graph Query Parser(Example)
➤ 登録
curl -H 'Content-Type: application/json' 'http://localhost:8983/solr/my_graph/update?commit=true' --data-binary '[
{"id":"A","name": 果物, "out_edge":["1","2","3"], "in_edge":[] },
{"id":"B","name": りんご, "out_edge":[], "in_edge":["1"] },
{"id":"C","name": みかん, "out_edge":[], "in_edge":["2"] },
{"id":"D","name": ぶどう, "out_edge":[], "in_edge":["3"] },
{"id":"E","name": 野菜, "out_edge":["4","5"], "in_edge":[] },
{"id":"F","name": いちご, "out_edge":[], "in_edge":["4"] },
{"id":"G","name": スイカ, "out_edge":[], "in_edge":["5"] },
{"id":"H","name": 米, "out_edge":[], "in_edge":[] }
➤ 検索
http://localhost:8983/solr/my_graph/query?fl=name&q={!graph from=in_edge to=out_edge returnRoot=false}name:果物
...
"response":{"numFound":3,"start":0,"docs":[
{ "name":"いちご" },
{ "name":"ぶどう" },
{ "name":"りんご" },
]
}
果物
みかんぶどうりんご
野菜
いちごスイカ
米

Solr6 の紹介(第18回 Solr勉強会資料) (2016年6月10日)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Solr6 の紹介(第18回 Solr勉強会資料) (2016年6月10日)

Similaire à Solr6 の紹介(第18回 Solr勉強会資料) (2016年6月10日) (20)

Dernier

Dernier (9)