分散グラフデータベース DataStax Enterprise Graph

分散グラフデータベース
DataStax Enterprise Graph
db tech showcase Tokyo 2017
森下雄貴

スピーカー
森下雄貴
(yuki@datastax.com)
- Solutions Architect (ときどきSoftware Developer) @ DataStax
- Apache Cassandra™コミッター

グラフデータベースとは？
- 高度に入り組んだ複雑な関係の(グラフ構造)データを格納、管理、問い合わせるため
のデータベース
- グラフデータベースのアーキテクチャは、大量のデータが織り成す関係の中から価値
を導き出したり、共通項や例外を見つけ出したりするのに特に適している

プロパティグラフ
- ノード(頂点/Vertex)  エンティティ
- エッジ(辺/Edge)  リレーション
- プロパティ  エンティティやリレーションの属性
Resides
Purchased
Purchased
Has
Belongs To
Ships To

セキュリティと不正検知
- 不正、セキュリティリスクを伴う、あるいはコンプライアンス上懸念があるエンティ
ティや取り引き、またはやりとりをすばやく特定するにはどうすれば良いか。
- グラフデータベースは、ユーザー、エンティティ、取り引き、イベント、やりとりが
織り成す複雑で高度に入り組む網の中で、数えきれない量の金融取引が関わる関係性
とイベントの干草の中から悪い針をすばやく見つけることが可能。

レコメンデーション/パーソナライズ
- 顧客に商品を購入してもらったり、他の顧客に商品をレコメンドしてもらったりする
ために、顧客にすばやく、最も効果的に影響を与えるにはどうすれば良いか。
- グラフは、ユーザーのデータや相互関係、過去の振る舞い、やりとりに基づいて、商
品や次のアクションを勧めたり、あるいは宣伝を提示したりするのに最も適している。

マスターデータ管理(MDM)
- さまざまな事業部にまたがって統合化されている業務データとそれらの相互関係をす
ばやく理解して分析し、顧客の全体像を把握するにはどうすれば良いか。
- グラフにおけるMDMの事例としては、複雑な階層構造やその他の相互関係を持つ
商品カタログやカスタマー360アプリケーションがある。

IoT/資産管理/ネットワーク
- 個別に調べるよりも全体として見たほうが、より興味深い傾向がある、データ要素間
で形成される数々の関係を簡単に分析するにはどうすれば良いか。
- グラフはまた、ネットワーク資産（およびそれらの特性や設定）やそれらの時間経過
に伴う相互関係を管理するのにも適したモデル。

RDBMS vs. グラフ
- グラフデータベースとRDBMSの主要な違いの一つは、エンティティ／頂点の間の関係が優先付けされ管
理される方法。
- RDBMSでは外部キーを使ってエンティティを二次的に繋ぐのに対し、グラフデータベースにおか
えるエッジ（リレーション）は、第1級の重要性を持つ。
- リレーションがグラフデータモデルに明示的に組み込まれる。
- グラフ型のビジネス課題とは、個別のエンティティでなく、エンティティ（頂点）間のリレーション
（エッジ）が関心事であるもの。
RDBMS グラフ
「何か」識別できるもの、あるいは追跡するオブジェクトエンティティノード（Vertex）
２つのオブジェクトの間のつながりまたは参照リレーションエッジ（Edge）
オブジェクトの特性属性プロパティ

DataStax Enterprise (DSE) Graph

DataStax
Enterprise
Apache Cassandraを
中核にした製品
+ サポート
+ プロフェッショナルサー
ビス
+ トレーニング

Apache Cassandra™との統合
- DSE Graphは、常時オン、どこでも書き込み・読み取り・アクティブの機能、リニア
なスケーラビリティ、安定的な低レイテンシーのレスポンスタイム、成熟した運用手
法をはじめとするCassandraの主要な利点を引き継ぐ
- その基礎の上に、DSE Graphは適応型のクエリーオプティマイザー、局所性優先のグ
ラフデータパーティショナー、分散クエリー実行エンジン、各種のグラフ固有イン
デックス構造などを含む、パフォーマンス強化のための機能を追加

DSE Graphとは
- 複雑で高度に入り組んだデータをリアルタイムで管理する必要のあるクラウドアプリ
ケーションのためのスケールアウト型のグラフデータベース
- DSEの中で、Apache Cassandra™に合わせてエンジニアリングしたプロパティグラ
フモデルをネイティブでサポート
- 大規模なグラフの中でデータ間の関係をすばやく簡単に格納・検索
- DSEとの密接なインテグレーションを介してリアルタイムサーチ、分析グラフクエ
リーを組み込みでサポート
- DSEのマルチモデルプラットフォームの一要素

DSE Graphのアーキテクチャ
DataStax Enterprise Graph
Apache TinkerPop™
GraphComputer
Apache TinkerPop™
Gremlin Server
Apache Spark™
Apache Cassandra™
Storage / Indexing
Apache Solr™
Indexing
アプリケーション
(DataStax Driver)
DataStax Studio
Apache TinkerPop™
Gremlin Console
3rdパーティ
グラフ可視化ツール
DSE Server
OLAP OLTP

Apache TinkerPop™
- グラフデータベース(OLTP)やグラフデータ分析システム(OLAP)のためのオープン
ソースのグラフコンピューティングフレームワーク
- グラフのデータ構造(プロパティグラフ)そのものと、それを処理するためのフレーム
ワークを提供
- Gremlin: グラフデータベースのための標準言語
- Apache TinkerPop™対応グラフデータベース
- DSE Graph
- Microsoft Azure CosmosDB
- Neo4j
- JanusGraph (Titan DBのフォーク)
- OrientDB
- …

Gremlin
- グラフトラバーサル言語
// Lisaが購入した全ての注文を探す
g.V().has('customer', 'name', 'Lisa')
.out('ordered')
.values('number')
// Lisaの友人が購入した全ての商品を探す
g.V().has('customer', 'name', 'Lisa')
.outE('related').has('Type', 'friend')
.inV().out('ordered').out('purchased')
.values('name’)
1
Customer
Name:[Lisa]
Age:[32]
2
Order
Number:[1234]
5
Product
Name:[Socks]
Size: [XL]
4
Customer
Name:[Frank]
Age:[28]
6
Product
Name:[Shirt]
Size: [XL] 7
Address
Stree:[123 West Street]
Zip Code:[44534]
11
12 13
14
15
16
17
3
Tag
Type:[Color]
Value: [White]
18
19
orders
Date:[1/1/2016]
related
Type:Friend
resides
Since:1/1/2000
ships
Shipment Date:1/2/2016
purchased
Qty: 42
purchased
Qty: 1
has
Valid: 1/1/2012
has
Valid: 1/1/2012

DataStax Studio
DSEのためのビジュアル開発ツール(Gremlin/CQL)

アプリケーションからの接続
DataStax ドライバー

GLV (Gremlin Language Variant)
import com.datastax.dse.graph.api.DseGraph;
GraphTraversalSource g = DseGraph.traversal();
GraphTraversal traversal = g.V().values("name").range(0, 4).groupCount();
GraphStatement statement = DseGraph.statementFromTraversal(traversal);
GraphResultSet results = dseSession.executeGraph(statement);
from dse_graph import DseGraph
g = DseGraph.traversal_source()
traversal = g.V().name[0:4].groupCount()
statement = DseGraph.query_from_traversal(traversal)
results = dse_session.execute_graph(statement)

映画のレコメンデーション
- データセット: MovieLense 1M Dataset
- https://grouplens.org/datasets/movielens/1m/
- レイティング: 約1,000,000
- ユーザー: 約6,000
- 映画: 約4,000
- (2003年2月時点)
- レコメンデーション
- 「この映画を高く評価したユーザーは、こんな映画も好きなはず。」

デモ環境
- Dockerコンテナ
- DataStax Enterprise 5.1.2
- Search と Graph を有効化
- DataStax Studio 2.0.0
- DataStaxオフィシャルイメージは準備中
- 弊社エバンジェリストが公開しているものも利用できる
- https://github.com/LukeTillman/dse-docker
- https://github.com/LukeTillman/ds-studio-docker

MovieLenseスキーマ
schema.propertyKey('id').Int().single().create()
schema.propertyKey('name').Text().single().create()
schema.propertyKey('zipcode').Text().single().create()
schema.propertyKey('gender').Text().single().create()
schema.propertyKey('year').Int().single().create()
schema.propertyKey('stars').Int().single().create()
schema.propertyKey('age').Int().single().create()
schema.propertyKey('timestamp').Timestamp().single().create()
schema.vertexLabel('movie').partitionKey('id').properties('name', 'year').create()
schema.vertexLabel('movie').index('search').search().by('name').add()
schema.vertexLabel('user').partitionKey('id').properties('age', 'gender', 'zipcode').create()
schema.vertexLabel('occupation').partitionKey('id').properties('name').create()
schema.vertexLabel('genre').properties('name').create()
schema.vertexLabel('genre').index('byname').materialized().by('name').add()
schema.edgeLabel('occupation').connection('user', 'occupation').create()
schema.edgeLabel('genre').connection('movie', 'genre').create()
schema.edgeLabel('rated').properties('timestamp', 'stars').connection('user', 'movie').create()

グラフデータの投入
- Gremlin API
- Gremlin I/O
- GraphML
- GraphSON
- Gryo
- DSEグラフローダー

ノードとエッジの追加
g.addV('movie').property('id', 9999)
.property('name', 'Death Note')
.property('year', 2017)
g.addV('user').property('id', 9999)
.property('gender', 'M')
g.V().has('movie', 'id', 9999).as('m')
.V().has('user', 'id', 9999).as('u')
.addE('rated').from('u').to('m').property('stars', 3)

DSEグラフローダー
- CSV、JSON(Graphson)、RDB、バイナリ(Gryo)等からDSE Graphへデータをバルク
ロード
- マッピングのためのスクリプトをGroovyで記述
- スキーマの自動生成も可能(細かいコントロールが必要であれば事前に作成)

マッピングスクリプト
(一部抜粋: 完全なスクリプトは https://gist.github.com/yukim/6cab6f3270d60da3b1604c434e5e092f)
movies = File.text(inputDir + 'movies.dat')
.delimiter("::")
.header('id', 'name', 'genre')
movieInfo = movies.map {
def name = it['name']
def matcher = name =~ /(?<name>.*) ((?<year>d{4}))$/
if (matcher.matches()) {
it['name'] = matcher.group('name')
it['year'] = matcher.group('year').toInteger()
} else {
it['name'] = name
it['year'] = 0
}
it
}
movieload(movieInfo).asVertices {
label 'movie'
key id: 'id'
ignore 'genre'
}

Graph Loaderの実行
$ ./dse-graph-loader/graphloader movielense_loader.groovy -graph
movielense -address localhost

DSE Searchの利用
DSE Search (Apache Solr™統合)が作成したインデックスの利用
g.V().has('movie', 'name', Search.token('Christmas'))

リコメンデーション
g.V().has('movie', 'name', 'Fight Club')
.inE('rated').has('stars', 5)
.outV()
.has('gender', 'M').has('age', '35')
.outE('rated').has('stars', 5)
.inV()
.has('name', neq('Fight Club'))
.groupCount().by('name')
.unfold()
.order().by(values, decr).limit(10)
// 映画 Fight Club に…
// 星5つをつけている…
// ユーザーで…
// 性別が男、年齢が35の人が…
// 星5つをつけている…
// 映画で…
// 名前がFight Clubでないものを…
// 名前でグループ化してカウントし…
// (展開して)
// 降順で並び替えて最初の10個を取得

クエリー分析/プロファイリング
.explain()
.profile()

興味を持ったあなた!
- DSEダウンロードとフリーのオンライントレーニング
- 開発用途に無償利用可能
- DS330: DataStax Enterprise Graph (12時間)
- https://academy.datastax.com/
Gremlin/DSE Graphサンプル
- https://github.com/datastax/graph-examples

分散グラフデータベース DataStax Enterprise Graph

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 分散グラフデータベース DataStax Enterprise Graph

Similar to 分散グラフデータベース DataStax Enterprise Graph (20)

More from Yuki Morishita

More from Yuki Morishita (8)

分散グラフデータベース DataStax Enterprise Graph