SlideShare a Scribd company logo
1 of 27
学術的に見た
ストリームデータ処理
2013年6月28日
筑波大学 講師
川島英之
Disclaimer
• 学術的に見たストリーム処理について私見を述べます。
• 機能・性能・信頼性・安全性・信憑性の内、一部(機能
と性能)に関してのみ述べます。
• 内容には誤りがある可能性があります。
概要
• キーワード分類
• 重要な概念
– Continual query
STORM
Norikra
Jubatus
CEP
DSMS
SPE
Relational-stream
XML-stream
S4
STREAM
System S
Algorithm trading
Borealis(MIT/Brandeis)
Stream computing
Complex event processing
Online learning
Incremental computation
Continual query
Spring
(DTW)
CPD
(Change
Point
Detection) Window-aggregate
Window-join
FPGA GPU
SASE
Fraud detection
Malware detection
AQP
(Adaptive Query Proc.)
Esper
BRIMOS Handshake-join
Incr.
LOCI
Online
LDA
Window
Real-time
Tuple-stream
Materialized view
Tapestry
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
Continual query, window
• Continual query
– DSMS: Queries are persistent, data are volatile
– DBMS: Data are persistent, queries are volatile
– CQ: Tapestryで導入された概念
• “Continuous queries over append only databases”, Terry,
et.al, SIGMOD’92.
• Window
– 無限長のデータを有限長に変換
• Type: ROWS or TIME
• Operators
– Aggregate -> window aggregate
– Join -> window join
• S. Babu and J. Widom. Continuous Queries over
Data Streams, SIGMOD Record, Sep. 2001
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
Relational Stream, DSMS
• リレーショナルデータ処理をCQ化
– 狭義
• Selection, projection, …
• Join, aggregate, set operationsは窓が必須
– 広義
• 各種のマイニング処理
• Relational completenessを満たせば何でもOK
• DSMS (data stream management system)
– 連続的問合せを管理するシステム
• Relational
– STREAM (Stanford) -> Coral8 -> Aleri -> Sybase -> SAP
– Telegraph (UCB) -> CISCO
– Aurora -> Borealis -> StreamBase
• Non relational
– STORM
– S4(小山田さん@NECが詳しい)
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
Incremental computation
• 差分計算
– 前と変わった部分のみ処理を行う計算方式
• 非常に多くの計算手法が提案
– Aggregate (MIN, MAX, AVG, SUM)
– Similarity search (dynamic time warping(SPRING),
ハウスドルフ距離)
– Handshake join
– Incremental LOCI (local outlier correlation integral)
• “Incremental outlier detection in data streams using local correlation integral”.
Xinjie Lu, Tian Yang, Zaifei Liao, Manzoor Elahi, Wei Liu, and Hongan Wang. SAC
2009.
• “Incoop: MapReduce for Incremental Computations”. Pramod Bhatotia, Alexander
Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquini (MPI-SWS), SOCC
2011
• “Fast Incremental and Personalized PageRank”, Bahman Bahmani (Stanford
University), Abdur Chowdhury (Twitter Inc.), Ashish Goel (Stanford University,
Twitter Inc.), VLDB 2011
• “Incremental Graph Pattern Matching”, Wenfei Fan, University of Edinburgh;
Jianzhong Li, Harbin Institute of Technology; Jizhou Luo, Harbin Institute of
Technology; Zijing Tan, ; Xin Wang, University of Edinburgh; Yinghui Wu*, University
of Edinburgh, SIGMOD 2011
• “iCBS: Incremental Cost-based Scheduling under Piecewise Linear SLAs”, Yun Chi
(NEC Laboratories, America), Hyun Moon (NEC Labs America), Hakan Hacigumus
(NEC Labs America), VLDB 2011
• “An Incremental Hausdorff Distance Calculation Algorithm”, Sarana Nutanong
(University of Maryland), Edwin Jacox (University of Maryland), Hanan Samet
(University of Maryland), VLDB 2011
• “Large-scale Incremental Processing Using Distributed Transactions and
Notifications”, Daniel Peng and Frank Dabek, Google, Inc., OSDI 2011
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
Online-LDA
• Limin Yao, Efficient Methods for Topic Model Inference
on Streaming Document Collections, KDD’09
– Gibbs samplingを通常の20倍程度高速化
– 既存の高速サンプリング(下)より2倍高速
• I. Porteous, Fast collapsed Gibbs sampling for latent Dirichlet
allocation. In SIGKDD, 2008.
• Matthew D. Hoffman, Online Learning for Latent
Dirichlet Allocation, NIPS’10
– オンライン変分ベイズ
– Gibbs samplingとの比較がない
– 変分ベイズより高性能と主張
– Blei’03に数行追加のみと,論文には記述
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
BRIMOS, Malware
• Xrosscloud®橋梁監視ソリューション
(BRIMOS®)
– 橋梁に設置した各種センサを用いて、リアル
タイムかつ継続的に橋の状態を監視する橋梁
モニタリングシステム
• Malware
– NICTER
• 21万程度のhostのダークネットトラフィック
• MWS’13に参加すれば利用可能 (NON STOP)
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
DSMS
Tuple-
stream
Tapestry
AQP
MV
XML Stream
• XFilterが起源
– Mehmet Altinel and Michael J. Franklin. “Efficient
Filtering of XML Documents for Selective
Dissemination of Information”. VLDB '00.
• 学術的には重要 ?
– High-Performance Complex Event Processing over
XML Streams
• Barzan Mozafari, Kai Zeng, Carlo Zaniolo (UCLA)
• SIGMOD’12, Best paper award
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
高性能H/W
• FPGA
– アプリ回路を構築可能
– プロトコルスタックも実装可能
– E-trees (http://e-trees.jp/)
• GPU
– 1500程度の並列性
• MIC
– 天河2号で利用(スパコン Top 500で1位)
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
Adaptive Query Processing
• Operator treeの構造を動的に変更
– “Eddies: continuously adaptive query processing”,
Ron Avnur and Joseph M. Hellerstein, SIGMOD’00.
A B C D
A B
C
D
A B
C
D
A B C D
Left
Deep
Tree
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
データ処理パラダイム
• バッチ処理
– データが永続的
– RDB: Oracle, Vertica, GreenPlum
– NoSQL: Hadoop, MongoDB
• リアルタイム処理
– 問合せが永続的,窓関数
– DSMS: 日立,Sybase, IBM
– NoSQL: Storm, S4
Database
メモリ(低遅延)
ディスク(高遅延)
ストリーム 分析
処理
Database
メモリ(低遅延)
ディスク(高遅延)
ストリーム
分析
処理
DSMS: Data Stream Processing System
Stream computing(IBM)CEP
Continual query
Relational
stream
XML
stream
Algorithm trading
DSMS
Norikra
Online
learning
FPGA
GPU
Esper
STREAM
Malware detection
Complex
event
processing
Jubatus
Incremental
computation
BorealisSystem S
Spring
(DTW)
BRIMOS
MIC
高性能H/W
Handshake
join
Window
aggregate
SASE
Cayuga
Fraud detection
Window
STORM
S4
Incr.
LOCI
Window
join
CPD
Online
LDA
Real-time
Tuple-
stream
Tapestry
AQP
MV
ストリーム処理の問題点:応用
• 演算
– Window-join
– Window-aggregate
– Window-something
• 「ぱっと今すぐに知りたい」事象とは?
– アルゴリズムトレーディング
– マルウェア検知
– RTB、クレジットカードなどの詐欺
まとめ
• キーワードの分類
• 重要概念
– Continual query
• Window
• Incremental computation
• 未解決分野
– Adaptive query processing
– Join of stream and relation (小山田さん@
NEC)

More Related Content

Viewers also liked

kibayos-ID/Locator-081031
kibayos-ID/Locator-081031kibayos-ID/Locator-081031
kibayos-ID/Locator-081031Mikio Yoshida
 
kibayos beaker-070829
kibayos beaker-070829kibayos beaker-070829
kibayos beaker-070829Mikio Yoshida
 
トランザクションの設計と進化
トランザクションの設計と進化トランザクションの設計と進化
トランザクションの設計と進化Kumazaki Hiroki
 
Baloncesto
BaloncestoBaloncesto
Baloncestoaha100
 
Etiquetas de cd
Etiquetas de cdEtiquetas de cd
Etiquetas de cdcastelbi
 
Ramazan hilal-2011
Ramazan hilal-2011Ramazan hilal-2011
Ramazan hilal-2011HamidAslan
 
Carta descriptiva wiki
Carta descriptiva wikiCarta descriptiva wiki
Carta descriptiva wikiairamm87
 
Tumanyan prezentation10000
Tumanyan prezentation10000 Tumanyan prezentation10000
Tumanyan prezentation10000 Armine
 
Christine Haigh: Financial markets and food price volatility - proposals to r...
Christine Haigh: Financial markets and food price volatility - proposals to r...Christine Haigh: Financial markets and food price volatility - proposals to r...
Christine Haigh: Financial markets and food price volatility - proposals to r...futureagricultures
 
Un bello-ejemplo-diapositivas
Un bello-ejemplo-diapositivasUn bello-ejemplo-diapositivas
Un bello-ejemplo-diapositivasJackson Dj
 
Wfwa tv, pbs ft. wayne
Wfwa tv, pbs ft. wayneWfwa tv, pbs ft. wayne
Wfwa tv, pbs ft. wayneweathervision
 
Englekirk News
Englekirk NewsEnglekirk News
Englekirk Newskimtanouye
 
Mysql+handlersocket=nosql
Mysql+handlersocket=nosqlMysql+handlersocket=nosql
Mysql+handlersocket=nosqlSergey Xek
 
National day of romania ziua nationala a romaniei
National day of romania ziua nationala a romanieiNational day of romania ziua nationala a romaniei
National day of romania ziua nationala a romanieibalada65
 

Viewers also liked (20)

kibayos-ID/Locator-081031
kibayos-ID/Locator-081031kibayos-ID/Locator-081031
kibayos-ID/Locator-081031
 
kibayos beaker-070829
kibayos beaker-070829kibayos beaker-070829
kibayos beaker-070829
 
トランザクションの設計と進化
トランザクションの設計と進化トランザクションの設計と進化
トランザクションの設計と進化
 
Prueba power
Prueba powerPrueba power
Prueba power
 
Agra tour iii article
Agra tour iii articleAgra tour iii article
Agra tour iii article
 
Cruzcerna
CruzcernaCruzcerna
Cruzcerna
 
Issue 7 March 2011
Issue 7 March 2011Issue 7 March 2011
Issue 7 March 2011
 
Baloncesto
BaloncestoBaloncesto
Baloncesto
 
Etiquetas de cd
Etiquetas de cdEtiquetas de cd
Etiquetas de cd
 
Ramazan hilal-2011
Ramazan hilal-2011Ramazan hilal-2011
Ramazan hilal-2011
 
Carta descriptiva wiki
Carta descriptiva wikiCarta descriptiva wiki
Carta descriptiva wiki
 
Tumanyan prezentation10000
Tumanyan prezentation10000 Tumanyan prezentation10000
Tumanyan prezentation10000
 
Christine Haigh: Financial markets and food price volatility - proposals to r...
Christine Haigh: Financial markets and food price volatility - proposals to r...Christine Haigh: Financial markets and food price volatility - proposals to r...
Christine Haigh: Financial markets and food price volatility - proposals to r...
 
Formato planeacion
Formato planeacionFormato planeacion
Formato planeacion
 
Un bello-ejemplo-diapositivas
Un bello-ejemplo-diapositivasUn bello-ejemplo-diapositivas
Un bello-ejemplo-diapositivas
 
Wfwa tv, pbs ft. wayne
Wfwa tv, pbs ft. wayneWfwa tv, pbs ft. wayne
Wfwa tv, pbs ft. wayne
 
Magento
MagentoMagento
Magento
 
Englekirk News
Englekirk NewsEnglekirk News
Englekirk News
 
Mysql+handlersocket=nosql
Mysql+handlersocket=nosqlMysql+handlersocket=nosql
Mysql+handlersocket=nosql
 
National day of romania ziua nationala a romaniei
National day of romania ziua nationala a romanieiNational day of romania ziua nationala a romaniei
National day of romania ziua nationala a romaniei
 

Similar to 学術的に見たストリームデータ処理(私見)

A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...Wenjing Chu
 
Smart camera monitoring system
Smart camera monitoring systemSmart camera monitoring system
Smart camera monitoring systemArvind Krishnaa
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Jeffrey Sica
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The BoxIan Foster
 
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...Wright State University, Dayton, OH, USA
 
Issues with Ingesting/Staging/Analyzing Data in ConMon Implementation
Issues with Ingesting/Staging/Analyzing Data in ConMon ImplementationIssues with Ingesting/Staging/Analyzing Data in ConMon Implementation
Issues with Ingesting/Staging/Analyzing Data in ConMon ImplementationTieu Luu
 
Koss 1605 machine_learning_mariocho_t10
Koss 1605 machine_learning_mariocho_t10Koss 1605 machine_learning_mariocho_t10
Koss 1605 machine_learning_mariocho_t10Mario Cho
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Implementation domain driven design - ch04 architecture
Implementation domain driven design - ch04 architectureImplementation domain driven design - ch04 architecture
Implementation domain driven design - ch04 architectureHarry Yao
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Seattle DAML meetup
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT OperationsNeo4j
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...Alan Quayle
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Sri Ambati
 
Poster jsoe research expo 2008
Poster   jsoe research expo 2008Poster   jsoe research expo 2008
Poster jsoe research expo 2008bdemchak
 

Similar to 学術的に見たストリームデータ処理(私見) (20)

A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...
 
USENIX OSDI2010 Report
USENIX OSDI2010 ReportUSENIX OSDI2010 Report
USENIX OSDI2010 Report
 
Smart camera monitoring system
Smart camera monitoring systemSmart camera monitoring system
Smart camera monitoring system
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The Box
 
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...
Mahdieh zabihi imc45-Fuzzy Inference for Intrusion Detection of Web Robots in...
 
Issues with Ingesting/Staging/Analyzing Data in ConMon Implementation
Issues with Ingesting/Staging/Analyzing Data in ConMon ImplementationIssues with Ingesting/Staging/Analyzing Data in ConMon Implementation
Issues with Ingesting/Staging/Analyzing Data in ConMon Implementation
 
Koss 1605 machine_learning_mariocho_t10
Koss 1605 machine_learning_mariocho_t10Koss 1605 machine_learning_mariocho_t10
Koss 1605 machine_learning_mariocho_t10
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Implementation domain driven design - ch04 architecture
Implementation domain driven design - ch04 architectureImplementation domain driven design - ch04 architecture
Implementation domain driven design - ch04 architecture
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT Operations
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...
TADSummit, DataArt Keynote: Security in Virtualized Telecom Networks Michael ...
 
Venkata brundavanam 2020
Venkata brundavanam 2020Venkata brundavanam 2020
Venkata brundavanam 2020
 
Venkata brundavanam 2020
Venkata brundavanam 2020Venkata brundavanam 2020
Venkata brundavanam 2020
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Poster jsoe research expo 2008
Poster   jsoe research expo 2008Poster   jsoe research expo 2008
Poster jsoe research expo 2008
 

Recently uploaded

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

学術的に見たストリームデータ処理(私見)

  • 4. STORM Norikra Jubatus CEP DSMS SPE Relational-stream XML-stream S4 STREAM System S Algorithm trading Borealis(MIT/Brandeis) Stream computing Complex event processing Online learning Incremental computation Continual query Spring (DTW) CPD (Change Point Detection) Window-aggregate Window-join FPGA GPU SASE Fraud detection Malware detection AQP (Adaptive Query Proc.) Esper BRIMOS Handshake-join Incr. LOCI Online LDA Window Real-time Tuple-stream Materialized view Tapestry
  • 5. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • 6. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • 7. Continual query, window • Continual query – DSMS: Queries are persistent, data are volatile – DBMS: Data are persistent, queries are volatile – CQ: Tapestryで導入された概念 • “Continuous queries over append only databases”, Terry, et.al, SIGMOD’92. • Window – 無限長のデータを有限長に変換 • Type: ROWS or TIME • Operators – Aggregate -> window aggregate – Join -> window join • S. Babu and J. Widom. Continuous Queries over Data Streams, SIGMOD Record, Sep. 2001
  • 8. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • 9. Relational Stream, DSMS • リレーショナルデータ処理をCQ化 – 狭義 • Selection, projection, … • Join, aggregate, set operationsは窓が必須 – 広義 • 各種のマイニング処理 • Relational completenessを満たせば何でもOK • DSMS (data stream management system) – 連続的問合せを管理するシステム • Relational – STREAM (Stanford) -> Coral8 -> Aleri -> Sybase -> SAP – Telegraph (UCB) -> CISCO – Aurora -> Borealis -> StreamBase • Non relational – STORM – S4(小山田さん@NECが詳しい)
  • 10. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 11. Incremental computation • 差分計算 – 前と変わった部分のみ処理を行う計算方式 • 非常に多くの計算手法が提案 – Aggregate (MIN, MAX, AVG, SUM) – Similarity search (dynamic time warping(SPRING), ハウスドルフ距離) – Handshake join – Incremental LOCI (local outlier correlation integral)
  • 12. • “Incremental outlier detection in data streams using local correlation integral”. Xinjie Lu, Tian Yang, Zaifei Liao, Manzoor Elahi, Wei Liu, and Hongan Wang. SAC 2009. • “Incoop: MapReduce for Incremental Computations”. Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquini (MPI-SWS), SOCC 2011 • “Fast Incremental and Personalized PageRank”, Bahman Bahmani (Stanford University), Abdur Chowdhury (Twitter Inc.), Ashish Goel (Stanford University, Twitter Inc.), VLDB 2011 • “Incremental Graph Pattern Matching”, Wenfei Fan, University of Edinburgh; Jianzhong Li, Harbin Institute of Technology; Jizhou Luo, Harbin Institute of Technology; Zijing Tan, ; Xin Wang, University of Edinburgh; Yinghui Wu*, University of Edinburgh, SIGMOD 2011 • “iCBS: Incremental Cost-based Scheduling under Piecewise Linear SLAs”, Yun Chi (NEC Laboratories, America), Hyun Moon (NEC Labs America), Hakan Hacigumus (NEC Labs America), VLDB 2011 • “An Incremental Hausdorff Distance Calculation Algorithm”, Sarana Nutanong (University of Maryland), Edwin Jacox (University of Maryland), Hanan Samet (University of Maryland), VLDB 2011 • “Large-scale Incremental Processing Using Distributed Transactions and Notifications”, Daniel Peng and Frank Dabek, Google, Inc., OSDI 2011
  • 13. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • 14. Online-LDA • Limin Yao, Efficient Methods for Topic Model Inference on Streaming Document Collections, KDD’09 – Gibbs samplingを通常の20倍程度高速化 – 既存の高速サンプリング(下)より2倍高速 • I. Porteous, Fast collapsed Gibbs sampling for latent Dirichlet allocation. In SIGKDD, 2008. • Matthew D. Hoffman, Online Learning for Latent Dirichlet Allocation, NIPS’10 – オンライン変分ベイズ – Gibbs samplingとの比較がない – 変分ベイズより高性能と主張 – Blei’03に数行追加のみと,論文には記述
  • 15. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 16. BRIMOS, Malware • Xrosscloud®橋梁監視ソリューション (BRIMOS®) – 橋梁に設置した各種センサを用いて、リアル タイムかつ継続的に橋の状態を監視する橋梁 モニタリングシステム • Malware – NICTER • 21万程度のhostのダークネットトラフィック • MWS’13に参加すれば利用可能 (NON STOP)
  • 17. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time DSMS Tuple- stream Tapestry AQP MV
  • 18. XML Stream • XFilterが起源 – Mehmet Altinel and Michael J. Franklin. “Efficient Filtering of XML Documents for Selective Dissemination of Information”. VLDB '00. • 学術的には重要 ? – High-Performance Complex Event Processing over XML Streams • Barzan Mozafari, Kai Zeng, Carlo Zaniolo (UCLA) • SIGMOD’12, Best paper award
  • 19. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 20. 高性能H/W • FPGA – アプリ回路を構築可能 – プロトコルスタックも実装可能 – E-trees (http://e-trees.jp/) • GPU – 1500程度の並列性 • MIC – 天河2号で利用(スパコン Top 500で1位)
  • 21. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 22. Adaptive Query Processing • Operator treeの構造を動的に変更 – “Eddies: continuously adaptive query processing”, Ron Avnur and Joseph M. Hellerstein, SIGMOD’00. A B C D A B C D A B C D A B C D Left Deep Tree
  • 23. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 24. データ処理パラダイム • バッチ処理 – データが永続的 – RDB: Oracle, Vertica, GreenPlum – NoSQL: Hadoop, MongoDB • リアルタイム処理 – 問合せが永続的,窓関数 – DSMS: 日立,Sybase, IBM – NoSQL: Storm, S4 Database メモリ(低遅延) ディスク(高遅延) ストリーム 分析 処理 Database メモリ(低遅延) ディスク(高遅延) ストリーム 分析 処理 DSMS: Data Stream Processing System
  • 25. Stream computing(IBM)CEP Continual query Relational stream XML stream Algorithm trading DSMS Norikra Online learning FPGA GPU Esper STREAM Malware detection Complex event processing Jubatus Incremental computation BorealisSystem S Spring (DTW) BRIMOS MIC 高性能H/W Handshake join Window aggregate SASE Cayuga Fraud detection Window STORM S4 Incr. LOCI Window join CPD Online LDA Real-time Tuple- stream Tapestry AQP MV
  • 26. ストリーム処理の問題点:応用 • 演算 – Window-join – Window-aggregate – Window-something • 「ぱっと今すぐに知りたい」事象とは? – アルゴリズムトレーディング – マルウェア検知 – RTB、クレジットカードなどの詐欺
  • 27. まとめ • キーワードの分類 • 重要概念 – Continual query • Window • Incremental computation • 未解決分野 – Adaptive query processing – Join of stream and relation (小山田さん@ NEC)