SlideShare a Scribd company logo
1 of 42
Download to read offline
Best Better practice of Cassandra
Cassandraに不向きなCassandraデータモデリング基礎
Hayato Tsutsumi
Works Applications
Hayato Tsutsumi
堤 勇人
Cassandra experience :
7 years (from ver.0.6)
Certification for Apache Cassandra Administarator
Data size : about 40TB
Nodes:about 40 (would increase soon...)
Twitter : 2t3
Site Reliability Engineering Div.
Works Applications Co., Ltd
自己紹介 Speaker
Target
● Mid-range System
Data size
1TB ~ 1PB
Data amount
10 Mil ~ 100 Bil
+ high speed processing
What is better?
Best practice = Right people, right place
適材適所は確かにベスト
Suitable
Data
But not all data is suitable
じゃあベストじゃない部分は?
Our system
Suitable
Data
Un-suitable data for Cassandra
Use both Cassandra and RDB?
O*
or
M*
Suitable
Data
You may use only RDB...
O*
or
M*
Suitable
Data
Another way : Manage data only with Cassandra
Suitable
Data
3 models unlike NoSQL
Historical data
履歴管理データ
Tree structure
ツリー構造
Summarized data
計上データ
How Cassandra
read data in 3mins
前
提
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB text,
ckA text,
ckB text,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
hash(pkA1
,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1
,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
pkA1 pkB2 ckA3 ckB3 v3 w3
on Table
on Cassandra Cassandra can search Column name
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB int,
ckA int,
ckB int,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
hash(pkA1
,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1
,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
pkA1 pkB2 ckA3 ckB3 v3 w3
on Table
on Cassandra Cassandra can search Column name
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB int,
ckA int,
ckB int,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
where pkA = "pkA1"; //NG
where pkA = "pkA1" and pkB = "pkB1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"
and ckB = "ckB1"; //OK
where pkA = "pkA1" and ckA = "ckA1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB >= "pkB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"
and ckB >= "ckB1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"
and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"
and ckA < "ckA2"; //OK
Historical data
履歴管理データ
photo by Bryan Wright
(https://secure.flickr.com/photos/spidermandragon5/2922128673/)
社員の異動情報
Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
社員の異動情報
Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
at 3/25
emp001 emp002 emp003
B Div. A Div. E Div.
at 4/25
emp001 emp002 emp003
C Div. D Div. E Div.
emp_history table
CREATE TABLE emp_history (
id text,
no text,
s date,
e date,
div text,
PRIMARY KEY (id, s, e, no)
);
select * from emp_history
where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG
?
emp_history table
CREATE TABLE emp_history (
id text,
no text,
s date,
e date,
div text,
PRIMARY KEY (id, s, no)
);
CREATE CUSTOM INDEX fn_e ON
emp_history (e) USING
'org.apache.cassandra.index.sasi.
SASIIndex';
select * from emp_history
where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK
use custom index
Tree structure
ツリー構造
組織構造
Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list
● Path Enumeration
● Nested set
● Closure table
判断のポイント
Criteria
● No join, recursive query
● Anyway need consistency
● Jaywalk or
denormalization is Natural
● JOIN、再帰問い合わせ
不可
● 整合性はどの道別の方
法で取る必要がある
● ジェイウォーク、非正規化
も当たり前
ツリー構造への要求
Requirement to tree model
● show ancestors
● show children
● show descendants
● show sibilings of a
● あるノードからルートまで
の全ての親を取得
● 子供を1段展開
● 子供を全て展開
● 兄弟を取得
組織構造
Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list
● Path Enumeration
● Nested set
● Closure table
Worth considering!
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
id fqdn child code
test A [a,b] A
test A:a [1,2] a
test A:b [3,4,5] b
test A:a:1 1
test A:a:2 2
test A:b:3 3
test A:b:4 4
test A:b:5 5
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
select * from pathenum
where id = 'test' and fqdn like 'A:'; //NG
It needs 'like' search
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
select * from pathenum
where id = 'test' and fqdn like 'A:'; //NG
select * from pathenum
where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK
: U+003A
; U+003B
It needs 'like' search
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
//show ancestors
fqdn.split(":");
//show children of a
select child from pathenum where id = 'test' and
fqdn = 'A:a';
//show descendants of A
select * from fqdntest where id = 'test' and
fqdn >= 'A:' and fqdn < 'A;';
//show sibilings of a
select p from fqdntest where
id = 'test' and fqdn = 'A';
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
pros
- one access
cons
- hot spot
- range slice
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
id v
A A Div.
a a Dept.
b b Dept.
1 1 Sec.
2 2 Sec.
3 3 Sec.
4 4 Sec.
5 5 Sec.
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
//show ancestors
select p from closure_path where c = '1';
select * from closure_main where id in [?];
//show children of a
select c from closure_path where p = 'a' and d
= 1;
select * from closure_main where id in [?];
//show descendants of A
select c from closure_path where p = 'A';
select * from closure_main where id in [?];
//show sibilings of a
//load a's parent = A
select * from closure_path where c = 'a';
select c from closure_path where p = 'A' and d
= 1;
select * from closure_main where id in [?];
pros
- Distributed
- get access
cons
- need an index
- 2 ~ 3 times access
- increase data
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
pros
- Distributed
- get access
cons
- need an index
- 2 ~ 3 times access
- increase data
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
How increase data?
When assume n-children per
node and d-depth tree,
number of data will be
proportional to d.
Summarized
data
計上データ
伝票集計処理
Aggregation of slips
Dr. Cr.
A 200 B 50
C 150
伝票集計処理
Aggregation of slips
parallel batch processing
aggregation
online streaming
要求水準
Requirements
● miscalculation = critical
● need parallel / streaming
processing
● need high speed
processing
● 誤計算は死
● バッチの並列処理、オン
ラインによるストリーミン
グ処理が必要
● 高速処理が求められる
● miscalculation = critical
● need parallel / streaming
processing
● need high speed
processing
● 誤計算は死
● バッチの並列処理、オン
ラインによるストリーミン
グ処理が必要
● 高速処理が求められる
= Consistency!
要求水準
Requirements
計上データ
Summarized data
CREATE TABLE countup (
id text PRIMARY KEY,
v counter
);
UPDATE countup SET v = v + 1 WHERE id = 'test';
Use Counter...? No.
計上データ
Summarized data
CREATE TABLE countup (
id text PRIMARY KEY,
v int
);
UPDATE countup set v = 101 where id = 'test' if v =
100;
Use update with LWT
What is the best?
Thanks!

More Related Content

Similar to Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point

PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
Jerome Eteve
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
Jan Aerts
 
Advanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfAdvanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdf
ssuser785ce21
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plans
paulguerin
 

Similar to Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point (20)

Building and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CBuilding and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning C
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!
 
Csql Cache Presentation
Csql Cache PresentationCsql Cache Presentation
Csql Cache Presentation
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
Advanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfAdvanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdf
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plans
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
 
An OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parserAn OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parser
 
Drupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary EditionDrupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary Edition
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
 
Advanced Perl Techniques
Advanced Perl TechniquesAdvanced Perl Techniques
Advanced Perl Techniques
 
Perforce Object and Record Model
Perforce Object and Record Model  Perforce Object and Record Model
Perforce Object and Record Model
 
2016年のPerl (Long version)
2016年のPerl (Long version)2016年のPerl (Long version)
2016年のPerl (Long version)
 
Deeply Declarative Data Pipelines
Deeply Declarative Data PipelinesDeeply Declarative Data Pipelines
Deeply Declarative Data Pipelines
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018
 
flowr streamlining computing workflows
flowr streamlining computing workflowsflowr streamlining computing workflows
flowr streamlining computing workflows
 

More from Works Applications

RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
Works Applications
 

More from Works Applications (11)

Gitで安定マスターブランチを手に入れる
Gitで安定マスターブランチを手に入れるGitで安定マスターブランチを手に入れる
Gitで安定マスターブランチを手に入れる
 
Javaでつくる本格形態素解析器
Javaでつくる本格形態素解析器Javaでつくる本格形態素解析器
Javaでつくる本格形態素解析器
 
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
 
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
 
形態素解析
形態素解析形態素解析
形態素解析
 
Erpと自然言語処理
Erpと自然言語処理Erpと自然言語処理
Erpと自然言語処理
 
SpotBugs(FindBugs)による 大規模ERPのコード品質改善
SpotBugs(FindBugs)による 大規模ERPのコード品質改善SpotBugs(FindBugs)による 大規模ERPのコード品質改善
SpotBugs(FindBugs)による 大規模ERPのコード品質改善
 
Enterprise UI/UX - design as code
Enterprise UI/UX - design as codeEnterprise UI/UX - design as code
Enterprise UI/UX - design as code
 
Kubernetesにまつわるエトセトラ(主に苦労話)
Kubernetesにまつわるエトセトラ(主に苦労話)Kubernetesにまつわるエトセトラ(主に苦労話)
Kubernetesにまつわるエトセトラ(主に苦労話)
 
Demystifying kubernetes
Demystifying kubernetesDemystifying kubernetes
Demystifying kubernetes
 
Global Innovation Nights - Spark
Global Innovation Nights - SparkGlobal Innovation Nights - Spark
Global Innovation Nights - Spark
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point

  • 1. Best Better practice of Cassandra Cassandraに不向きなCassandraデータモデリング基礎 Hayato Tsutsumi Works Applications
  • 2. Hayato Tsutsumi 堤 勇人 Cassandra experience : 7 years (from ver.0.6) Certification for Apache Cassandra Administarator Data size : about 40TB Nodes:about 40 (would increase soon...) Twitter : 2t3 Site Reliability Engineering Div. Works Applications Co., Ltd 自己紹介 Speaker
  • 3. Target ● Mid-range System Data size 1TB ~ 1PB Data amount 10 Mil ~ 100 Bil + high speed processing
  • 5. Best practice = Right people, right place 適材適所は確かにベスト Suitable Data
  • 6. But not all data is suitable じゃあベストじゃない部分は? Our system Suitable Data Un-suitable data for Cassandra
  • 7. Use both Cassandra and RDB? O* or M* Suitable Data
  • 8. You may use only RDB... O* or M* Suitable Data
  • 9. Another way : Manage data only with Cassandra Suitable Data
  • 10. 3 models unlike NoSQL Historical data 履歴管理データ Tree structure ツリー構造 Summarized data 計上データ
  • 11. How Cassandra read data in 3mins 前 提
  • 12. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB text, ckA text, ckB text, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key hash(pkA1 ,pkB1) Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w Value v1 w1 v2 w2 hash(pkA1 ,pkB2) Column ckA3:ckB3:v ckA3:ckB3:w Value v3 w3 Column pkA pkB ckA ckB v w Value pkA1 pkB1 ckA1 ckB1 v1 w1 pkA1 pkB1 ckA1 ckB2 v2 w2 pkA1 pkB2 ckA3 ckB3 v3 w3 on Table on Cassandra Cassandra can search Column name
  • 13. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key hash(pkA1 ,pkB1) Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w Value v1 w1 v2 w2 hash(pkA1 ,pkB2) Column ckA3:ckB3:v ckA3:ckB3:w Value v3 w3 Column pkA pkB ckA ckB v w Value pkA1 pkB1 ckA1 ckB1 v1 w1 pkA1 pkB1 ckA1 ckB2 v2 w2 pkA1 pkB2 ckA3 ckB3 v3 w3 on Table on Cassandra Cassandra can search Column name
  • 14. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key where pkA = "pkA1"; //NG where pkA = "pkA1" and pkB = "pkB1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB = "ckB1"; //OK where pkA = "pkA1" and ckA = "ckA1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG where pkA = "pkA1" and pkB >= "pkB1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB >= "ckB1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckB = "ckB1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckA < "ckA2"; //OK
  • 15. Historical data 履歴管理データ photo by Bryan Wright (https://secure.flickr.com/photos/spidermandragon5/2922128673/)
  • 16. 社員の異動情報 Employee transfer history A Div. B Div. C Div. A Div. C Div. D Div. emp001 emp002 D Div. E Div.emp003 4/1 4/16 5/13/112/1 2/21
  • 17. 社員の異動情報 Employee transfer history A Div. B Div. C Div. A Div. C Div. D Div. emp001 emp002 D Div. E Div.emp003 4/1 4/16 5/13/112/1 2/21 at 3/25 emp001 emp002 emp003 B Div. A Div. E Div. at 4/25 emp001 emp002 emp003 C Div. D Div. E Div.
  • 18. emp_history table CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, e, no) ); select * from emp_history where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG ?
  • 19. emp_history table CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, no) ); CREATE CUSTOM INDEX fn_e ON emp_history (e) USING 'org.apache.cassandra.index.sasi. SASIIndex'; select * from emp_history where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK use custom index
  • 21. 組織構造 Organization tree A Div. a Dept. b Dept. 1 Sec. 2 Sec. 3 Sec. 4 Sec. 5 Sec. Well known models ● Adjacency list ● Path Enumeration ● Nested set ● Closure table
  • 22. 判断のポイント Criteria ● No join, recursive query ● Anyway need consistency ● Jaywalk or denormalization is Natural ● JOIN、再帰問い合わせ 不可 ● 整合性はどの道別の方 法で取る必要がある ● ジェイウォーク、非正規化 も当たり前
  • 23. ツリー構造への要求 Requirement to tree model ● show ancestors ● show children ● show descendants ● show sibilings of a ● あるノードからルートまで の全ての親を取得 ● 子供を1段展開 ● 子供を全て展開 ● 兄弟を取得
  • 24. 組織構造 Organization tree A Div. a Dept. b Dept. 1 Sec. 2 Sec. 3 Sec. 4 Sec. 5 Sec. Well known models ● Adjacency list ● Path Enumeration ● Nested set ● Closure table Worth considering!
  • 25. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); id fqdn child code test A [a,b] A test A:a [1,2] a test A:b [3,4,5] b test A:a:1 1 test A:a:2 2 test A:b:3 3 test A:b:4 4 test A:b:5 5 A a b 1 2 3 4 5
  • 26. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); select * from pathenum where id = 'test' and fqdn like 'A:'; //NG It needs 'like' search A a b 1 2 3 4 5
  • 27. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); select * from pathenum where id = 'test' and fqdn like 'A:'; //NG select * from pathenum where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK : U+003A ; U+003B It needs 'like' search A a b 1 2 3 4 5
  • 28. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); //show ancestors fqdn.split(":"); //show children of a select child from pathenum where id = 'test' and fqdn = 'A:a'; //show descendants of A select * from fqdntest where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //show sibilings of a select p from fqdntest where id = 'test' and fqdn = 'A'; A a b 1 2 3 4 5
  • 29. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); pros - one access cons - hot spot - range slice - complex process when update pros & cons
  • 30. 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); id v A A Div. a a Dept. b b Dept. 1 1 Sec. 2 2 Sec. 3 3 Sec. 4 4 Sec. 5 5 Sec. p c d A A 0 A a 1 A b 1 A 1 2 A 2 2 A 3 2 A 4 2 A 5 2 a a 0 a 1 1 p c d a 2 1 1 1 0 2 2 0 b b 0 b 3 1 b 4 1 b 5 1 3 3 0 4 4 0 5 5 0
  • 31. 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex'; p c d A A 0 A a 1 A b 1 A 1 2 A 2 2 A 3 2 A 4 2 A 5 2 a a 0 a 1 1 p c d a 2 1 1 1 0 2 2 0 b b 0 b 3 1 b 4 1 b 5 1 3 3 0 4 4 0 5 5 0 //show ancestors select p from closure_path where c = '1'; select * from closure_main where id in [?]; //show children of a select c from closure_path where p = 'a' and d = 1; select * from closure_main where id in [?]; //show descendants of A select c from closure_path where p = 'A'; select * from closure_main where id in [?]; //show sibilings of a //load a's parent = A select * from closure_path where c = 'a'; select c from closure_path where p = 'A' and d = 1; select * from closure_main where id in [?];
  • 32. pros - Distributed - get access cons - need an index - 2 ~ 3 times access - increase data - complex process when update pros & cons 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex';
  • 33. pros - Distributed - get access cons - need an index - 2 ~ 3 times access - increase data - complex process when update pros & cons 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex'; How increase data? When assume n-children per node and d-depth tree, number of data will be proportional to d.
  • 36. 伝票集計処理 Aggregation of slips parallel batch processing aggregation online streaming
  • 37. 要求水準 Requirements ● miscalculation = critical ● need parallel / streaming processing ● need high speed processing ● 誤計算は死 ● バッチの並列処理、オン ラインによるストリーミン グ処理が必要 ● 高速処理が求められる
  • 38. ● miscalculation = critical ● need parallel / streaming processing ● need high speed processing ● 誤計算は死 ● バッチの並列処理、オン ラインによるストリーミン グ処理が必要 ● 高速処理が求められる = Consistency! 要求水準 Requirements
  • 39. 計上データ Summarized data CREATE TABLE countup ( id text PRIMARY KEY, v counter ); UPDATE countup SET v = v + 1 WHERE id = 'test'; Use Counter...? No.
  • 40. 計上データ Summarized data CREATE TABLE countup ( id text PRIMARY KEY, v int ); UPDATE countup set v = 101 where id = 'test' if v = 100; Use update with LWT
  • 41. What is the best?