ストリーム処理とSensorBee

ストリーム処理とSensorBee
第4回ビックデータ処理基盤研究会
田中大輔
2016/03/22

自己紹介
 田中大輔 (@disktnk)
– 〜2008 理工学部機械工学科卒
– 〜2015 某金融SIer
– 最後は金融工学ライブラリ屋さん
– 2015〜某ﾋﾟｰ社
– アニオタ枠採用
– SensorBee 開発
– 他、最近は製造業系の案件に関わっている
2

本日話す内容
 なぜSensorBeeを必要としていたか (〜3分)
 ストリーム処理の基本とSensorBeeでの実現 (15〜20分)
 質問
– 時間が余るようでしたら、SensorBeeを使用したデモ(動画)を何個かお見せします。
 本日のNGワード: 人工知能、リアルタイム
3

本日話す内容
 質問
– 時間が余るようでしたら、SensorBeeを使用したデモを何個かお見せします。
4

 データの中央集権的収集は困難になる
– データ量の問題: 日本にある監視カメラ・スマートフォンの生成データは推定1,000PB/Year
– プライバシーの問題: クラウドへデータをアップロードしたくない・できない
– 情報量の問題: データ量に反して生成データの価値密度は薄い
5
IoTアプリケーションの直面する課題

 データを一か所に集めない前提のもとで深い分析を実現する
– ネットワークのエッジ上のデバイスのローカルでデータを解析
– 学習モデルなど抽出された情報だけがクラウド上に挙げられ、大域的な解析を行う
6
エッジヘビーコンピューティング

 Deep Learning + Edge-Heavy Networking
– Chainer
– SensorBee
7
シームレスなデータ活用

参考: Deep Intelligence in-Motion (DIMo)
8
Industries (Partners)Industries (PFN-involved)
SensorBee™: Streaming Processing Engine for IoT
Machine
learning
Deep Learning
Auto Manufacturing
Self-driving
/ADAS
Connected
Optimization
Predictive
maintenance
Healthcare
Drug
discovery
iPS cell
Retail
CRM
Ad
optimization
Surveillance
Security
Tracking
DeepIntelligence
in-Motion(DIMo)
Statistics
Tools
Computer
vision
Detect/Track/Rec
ognize
Reinforcement
learning
Distributed/Curri
culum
Time-series
RNN /
Representation
Sensor
fusion
Multi-modal
Annotation
Hawk
Feedback/Action
User applications
Camera UI
Kanohi
…
Libraries
Management

本日話す内容
 質問
– 時間が余るようでしたら、SensorBeeを使用したデモを何個かお見せします。
9

1. (Cotinuous) Queryの登録
2. データを流す、あるいは既に流れている
3. 入力されたデータに対して処理を行う “on the fly”
4. 処理結果を(クライアントに対して)継続的に出力する
Data Stream Management System(DSMS)
10
Continuous Queries
Streaming Inputs Streaming Outputs
data stream
1
3
42

DBMSとDSMSの比較
11
DBMS DSMS
Data persistent relations streams, time windows
Data access random sequential, one-pass
Updates arbitrary append-only
Update rates relatively low high, bursty
Processing model query-driven (pull-based) data-driven (push-based)
Queries one-time continuous
Query plans fixed adaptive
Query optimization one query multi-query
Query answers exact exact or approximate
Latency relatively high low
[Golab et al., 2010] p3 “Summary of differences between a DBMS and a DSMS”

Continuous Queryのセマンティクス
 Operators
– 入力された1つあるいは複数のデータに対して処理する単位。
– stream-to-relation, relation-to-relation, relation-to-stream
 Queues
– Operator間のデータ。SensorBeeでは1つのデータをTupleと呼ぶ。
 Synopses
– ストリームの操作や状態1つ1つの(簡易)表現の単位。Operatorとは独立してQuery
Planの最適化を考えるとき便利。SensorBeeでは特に定義していない。
12
[Aras et al., 2006]
[Jain et al., 2008]

Continuous Query Operators: シンプルな例
 selection
 join
 count
13
σa
S1 a a a a
f
pass or drop
⋈
S1
b d
c a d b a
insert
S2
b d g f e
probe
9S1 10 9 8 7
update (to “10”)
b a
f
generate result
b

Continuous Query Operator TypeとBQL
 BQL (SensorBeeで実装されているCQLの方言) のサンプル
14
SELECT RSTREAM S1.id, S1.hoge1, S2.hoge2
FROM S1 [RANGE 1 TUPLES], S2 [RANGE 1 TUPLES]
WHERE S1:id = S2:id;

BQL: Stream-to-Relation
 BQLサンプル
– Tuple based: [RANGE 1 TUPLES]
Time based: [RANGE 1 SECONDS]
– Buffering: [RANGE 3 SECONDS, DROP NEWEST IF FULL]
– Sliding Window, Tumbling Window (SensorBeeでは未サポート)
– セマンティクスが難しい。。
15
stream-to-relation

BQL: Relation-to-Relation
 BQLサンプル
– Selection
– Join
– Aggregation
– Filter etc...
16
relation-to-relation

BQL: Relation-to-Stream
 BQLサンプル
– RSTREAM / ISTREAM / DSTREAM
Example: http://sensorbee.readthedocs.org/en/latest/bql.html#id4
– Detail: http://sensorbee.readthedocs.org/en/latest/bql.html#relation-to-
stream-operators
17
relation-to-stream

SensorBeeの処理トポロジー
 Topology
– SensorBeeでは、入力から出力までの1つのまとまりをTopologyとして表現
– DAGとして表現される
18
Continuous Queries
Streaming Inputs
data stream

SensorBee: Source
 Source
– Topologyへの入力を表現するComponent
19
Continuous QueriesSources
data stream
S1
S2
S3
CREATE SOURCE S1 TYPE fluentd WITH ...;
CREATE SOURCE S2 TYPE mqtt WITH ...;
CREATE SOURCE S3 TYPE user_source WITH ...;

SensorBee: Stream
 Stream
– Tupleへの操作 (内部実装的には “Box”)
20
Streams
Sources
S1
S2
S3
CREATE STREAM B1 AS SELECT ISTREAM
udf1(*) FROM S1 [RANGE 1 SECONDS], S2 [RANGE 1SECONDS]
B1
B3
B2

SensorBee: Sink
 Sink
– Topologyからの出力を定義する
21
Streams
Sources
S1
S2
S3
B1
B3
B2
Sinks
D1
D2
D3
CREATE SINK D1 TYPE fluentd WITH ...;
INSERT INTO D1 FROM B1;
CREATE SINK D2 TYPE mqtt WITH ...;
CREATE SINK D3 TYPE user_sink WITH ...;

SensorBee: User Defined Stream Function (UDSF)
 UDSF
– 新たなSourceとして振る舞えるユーザ定義関数
22
Streams
Sources
S1
S2
S3
B1
B3
B2
Sinks
D1
D2
D3
CREATE SOURCE B2 AS SELECT RSTREAM
* FROM udsf1(“S2”) [RANGE 1 SECONDS];

SensorBee: User Defined State (UDS)
 UDS
– ストリーム上の各Componentから共通でアクセスできるShared State
23
Streams
Sources
S1
S2
S3
B1
B3
B2
Sinks
D1
D2
D3
CREATE STATE G1 TYPE user_state WITH...;
CREATE SOURCE B3 AS SELECT ISTREAM
* udf3(“G1”, B2:*), S3:*
FROM B2 [RANGE 1 SECONDS], S3 [RANGE 1 SECONDS]
WHERE B2:foo = S3:foo;
G1

Example: Twitterのつぶやきの分類
 機械学習と組み合わせたデモ
– Tutorial収録 http://sensorbee.readthedocs.org/en/latest/tutorial.html#using-
machine-learning
– Elasticsearchと機械学習を実際に連携させる
http://www.slideshare.net/nobu_k/elasticsearch-59627321
24
Twitter
Gen
der
Ag
e
Form
atting
Form
atting
Form
atting Labeli
ng
fluentd

他、詳細について
 http://sensorbee.io
 http://docs.sensorbee.io/en/latest/
 https://github.com/sensorbee/sensorbee
25

参考文献
 A. Arasu, S. Badu, and J. Widom. The CQL continuous query language:
Semantic foundations and query execution, 2006.
 N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnam,
U. Cetintemel, M. Cheriniack, R. Tibbertts, and S. Zdonik. Towerds a
streaming SQL standard, 2008.
 Lukasz Golab, M. Tamer Özsu. Data Stream Management, 2010.
26

ストリーム処理とSensorBee

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à ストリーム処理とSensorBee

Similaire à ストリーム処理とSensorBee (20)

ストリーム処理とSensorBee

Notes de l'éditeur