SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan 2016

@doanduyhai
SASI, Cassandra on the full text search ride
DuyHai DOAN
Apache Cassandra™ Evangelist
Apache Zeppelin™ Commiter
MILAN 25-26 NOVEMBER 2016

@doanduyhai
Datastax
•  Founded in April 2010
•  We contribute a lot to Apache Cassandra™
•  400+ customers (25 of the Fortune 100), 400+ employees
•  Headquarter in San Francisco Bay area
•  EU headquarter in London, ofﬁces in France and Germany
•  Datastax Enterprise = better Apache Cassandra + extra features
2

@doanduyhai
1 5 minutes introduction to Apache Cassandra™
2 SASI introduction
3 SASI cluster-wide
4 SASI local read/write path
5 Query planner
6 Take away
3

@doanduyhai
Trademark Policy
4
From now on …
Cassandra ⩵ Apache Cassandra™

@doanduyhai
5 minutes introduction to Apache
Cassandra™

@doanduyhai
The tokens
6
Random hash of #partition à token = hash(#p)
Hash: ] –x, x ]
hash range: 264 values
x = 264/2
C*
C*
C*
C*
C* C*
C* C*

@doanduyhai
Token ranges
7
A: −x,−
3x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
B: −
3x
4
,−
2x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
C: −
2x
4
,−
x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
D: −
x
4
,0
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
E: 0,
x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
F:
x
4
,
2x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
G:
2x
4
,
3x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
H :
3x
4
,x
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
H
A
E
D
B C
G F

@doanduyhai
Distributed tables
8
H
A
E
D
B C
G F
user_id1
user_id2
user_id3
user_id4
user_id5
CREATE TABLE users(
user_id int,
…,
PRIMARY KEY(user_id)
),

@doanduyhai
Distributed tables
9
H
A
E
D
B C
G F
user_id1
user_id2
user_id3
user_id4
user_id5

@doanduyhai
Coordinator node
10
Responsible for handling requests (read/write)
Every node can be coordinator
• masterless
• no SPOF
• proxy role
H
A
E
D
B C
G F
coordinator
request
1
2 3

@doanduyhai
What is SASI ?
13
•  SSTable-Attached Secondary Index à new 2nd index impl that follows
SSTable life-cycle
•  Objective: provide more performant & capable 2nd index

@doanduyhai
Who created it ?
14
Open-source contribution by an engineers team

@doanduyhai
Why is it better than native 2nd index ?
15
•  follow SSTable life-cycle (flush, compaction, rebuild …) à more optimized
•  new data-strutures
•  range query (<, ≤, >, ≥) possible
•  full text search options

@doanduyhai
Distributed index
18
On cluster level, SASI works exactly like native 2nd index
H
A
E
D
B C
G F
UK user1 user102 … user493
US user54 user483 … user938

@doanduyhai
Distributed search algorithm
19
H
A
E
D
B C
G F
coordinator
1st round
Concurrency factor = 1

@doanduyhai
20
H
A
E
D
B C
G F
coordinator
Not enough
results ?

@doanduyhai
21
H
A
E
D
B C
G F
coordinator
2nd round

@doanduyhai
22
H
A
E
D
B C
G F
coordinator
Still not enough
results ?

@doanduyhai
23
H
A
E
D
B C
G F
coordinator
3rd round

@doanduyhai
Caveat 1: non restrictive filters
24
H
A
E
D
B C
G F
coordinator
Hit all
nodes
eventually
L

@doanduyhai
Caveat 1 solution : always use LIMIT
25
H
A
E
D
B C
G F
coordinator
SELECT *
FROM …
WHERE ...
LIMIT 1000

@doanduyhai
Caveat 2: 1-to-1 index (user_email)
26
H
A
E
D
B C
G F
coordinator
Not found WHERE user_email = ‘xxx'

@doanduyhai
27
H
A
E
D
B C
G F
coordinator
Still no result
WHERE user_email = ‘xxx'

@doanduyhai
28
H
A
E
D
B C
G F
coordinator
At best 1 user found
At worst 0 user found
WHERE user_email = ‘xxx'

@doanduyhai
Caveat 2 solution: materialized views
29
For 1-to-1 index/relationship, use materialized views instead
CREATE MATERIALIZED VIEW user_by_email AS
SELECT * FROM users
WHERE user_id IS NOT NULL and user_email IS NOT NULL
PRIMARY KEY (user_email, user_id)
But range queries ( <, >, ≤, ≥) not possible …

@doanduyhai
Caveat 3: fetch all rows for analytics use-case
30
H
A
E
D
B C
G F
coordinator
Client

@doanduyhai
Caveat 3 solution: use co-located Spark
31
H
A
E
D
B C
G F
Local index ﬁltering in Cassandra
Aggregation in Spark
Local index query

@doanduyhai
SASI local read/write path

@doanduyhai
SASI Life-cycle: in-memory
33
Commit log1
. . .
1
Commit log2
Commit logn
Memory
. . .
MemTable
Table1
MemTable
Table2
MemTable
TableN
2
Index
MemTable1
Index
MemTable2
. . .
Index
MemTableN
3
ACK the client

@doanduyhai
Local write path data structures
34
Index mode, data type Data structure Usage
PREFIX, text Guava ConcurrentRadixTree name LIKE 'John%'
CONTAINS, text Guava ConcurrentSufﬁxTree
name LIKE ’%John%'
name LIKE ’%ny’
PREFIX, other JDK ConcurrentSkipListSet
age = 20
age >= 20 AND age <= 30
SPARSE, other JDK ConcurrentSkipListSet
age = 20
age >= 20 AND age <= 30
suitable for 1-to-N index with N ≤ 5

@doanduyhai
SASI Life-cycle: flush to SSTable
35
Commit log1
. . .
1
Commit log2
Commit logn
Memory
Table1
SStable1
Table2 Table3
SStable2 SStable3
4
OnDiskIndex1
OnDiskIndex2
OnDiskIndex3

@doanduyhai
OnDiskIndex files
36
SStable1
SStable2
user_id4 FR user_id1 US user_id5 FR
user_id3 UK user_id2 DE
OnDiskIndex1
FR US
OnDiskIndex2
UK DE
B+Tree-like
data structures

@doanduyhai
Local read path
37
•  first, optimize query using Query Planer (see later)
•  then load chunks (4k) of index files from disk into memory
•  perform binary search to find the indexed value(s)
•  retrieve the offset of the partition from the original SSTable
•  read the original data

@doanduyhai
Query planner
39
•  build predicates tree
•  predicates push-down & re-ordering
•  predicate fusions for != operator

@doanduyhai
Query optimization example
40
WHERE age < 100 AND fname LIKE 'p%' AND fname != 'pa%' AND age > 21

@doanduyhai
41
AND is associative
and commutative

@doanduyhai
42
!= transformed to
exclusion on range scan

@doanduyhai
43
AND is associative
and commutative

@doanduyhai
Index mode caveat
Beware of disk space usage for full text search !!!
Table albums with ≈ 110 000 records, 6.8Mb data size
45

@doanduyhai
SASI vs search engines
SASI vs Solr/ElasticSearch ?
•  Cassandra is not a search engine !!! (database = durability)
•  always slower because 2 passes (SASI index read + original Cassandra data)
•  no scoring
•  no ordering (ORDER BY)
•  no grouping (GROUP BY) à Apache Spark for analytics
If you don’t need the above features, SASI is for you!
46

@doanduyhai
SASI sweet spots
SASI is a relevant choice if
•  you need multi criteria search and you don't need ordering/grouping/scoring
•  you mostly need 100 to 10000 of rows for your search queries
•  you always know the partition keys of the rows to be searched for (this one applies to
native secondary index too)
47

@doanduyhai
SASI blind spots
SASI is a poor choice if
•  you have strong SLA on search latency, for example few millisecs requirement
•  ordering of the search results is important for you
48

@doanduyhai50
Thank You
@doanduyhai
duy_hai.doan@datastax.com

SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan 2016

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan 2016

Similaire à SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan 2016 (20)

Plus de Codemotion

Plus de Codemotion (20)

Dernier

Dernier (20)

SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan 2016