SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Elassandra :
Elasticsearch as a Cassandra Secondary Index
September the 8th, 2016 - LLD20
© DataStax, All Rights Reserved.
Elassandra : Elasticsearch as a Cassandra secondary index
2
Vincent Royer
Elassandra Author
Rémi Trouville
Strapdata Co-Founder
We have been working together for 8 years in the banking/insurance industry
Today’s objectives :
• Sharing our vision and excitement about our project
• Receiving feedback from you all about elassandra
• Meeting NOSQL gurus to exchange ideas, solutions, questions, … and beers
&
© DataStax, All Rights Reserved.
1 Introduction
2 How Elassandra works ?
3 Cool Features
4 Elassandra’s ecosystem
5 Roadmap
6 Q&A
3
© DataStax, All Rights Reserved. 4
Elassandra’s status (2016/09/08)
Current Usage
• Used for non-critical data including
• Application logs
• Server monitoring (CPU, memory…)
• Consolidation and reporting from various SQL databases.
Current status of Elassandra
• Still in beta version
• Needs testing on larger deployments
Production-ready targeted End of 2016
© DataStax, All Rights Reserved.
1 Introduction
2 How Elassandra works ?
3 Cool Features
4 Elassandra’s ecosystem
5 Roadmap
6 Q&A
5
Write operations
© DataStax, All Rights Reserved.
• The master node manages and broadcasts the cluster state
• Only primary nodes supports write operations
• On failure, a new master or primary node is elected
• By default, 5 shards per index
6
Elasticsearch design
Master
Node
Primary
Node
Primary
Node
Primary
Node
Replica
Node
Replica
Node
Replica
Node
Replica
Node
Replica
Node
Replica
Node
Read operations
© DataStax, All Rights Reserved.
• Elasticsearch code is embedded in Cassandra nodes
• Documents are stored as row in a Cassandra tables (no more _source in Elasticsearch)
• A custom secondary index synchronously updates elasticsearch indices
7
Elassandra design
Terminology
Elasticsearch Cassandra Description
Mapping Schema Defines data structures
Cluster Virtual Datacenter An elassandra datacenter is an elastic search cluster
Index Keyspace An index relies on a keyspace.
Type Table Each document type is stored in a cassandra table
Document Row A document is stored as a cassandra row where _id is
the primary key.
Field Column Each indexed field is backed by a cassandra column
Object or nested fields User Defined Type Automatically created User Defined Type to store
elasticsearch objects.
© DataStax, All Rights Reserved.
Elassandra Write Path
9
shard
Secondary index Secondary index
REST index a document
shard
Dynamic
Mapping
Update
node 1
(coordinator)
node 2 node 3
Elasticsearch
Layer
Cassandra
Layer
INSERT INTO (…) VALUES (…)
index a document
including _token
index a document
including _token
© DataStax, All Rights Reserved.
Elassandra Search Path
10
shard
Secondary index Secondary index
REST search
shard
node 1
(coordinator)
node 2 node 3
Elasticsearch
Layer
Cassandra
Layer
SELECT fields FROM
index.type WHERE PK=_id
Secondary index
shard
Search phase with _token filter
to avoid duplicates
Fetch phase
Response
© DataStax, All Rights Reserved.
Elasticsearch cluster state
Cluster state has 3 main sections :
1. Cluster information (cluster name, node ids)
2. Metadata (mapping definition, indices and data structures, stored in a Cassandra)
3. Routing table to route search operations (Built locally from the Cassandra topology)
11
"metadata" : {
"version" : 2,
"cluster_uuid" : "e8b9c9f0-0c07-4845-9c02-211a4dbf7ea6",
"templates" : { },
"indices" : {
"twitter" : {
"state" : "open",
"settings" : {
"index" : {
"creation_date" : "1471681453347",
"number_of_shards" : "1",
"number_of_replicas" : "0",
"uuid" : "j4zZS2eOTHaDcW3r1e_1DA",
"version" : {
"created" : "2010199"
}
}
},
"mappings" : {
"tweet" : {
"properties" : {
"size" : { "type" : "long"},
"post_date" : {
"format" : "strict_date_optional_time||epoch_millis",
"type" : "date"
},
"message" : {"type" : "string"},
"user" : {"type" : « string" }
}
}
}
"routing_table" : {
"indices" : {
"twitter" : {
"shards" : {
"0" : [ {
"state" : "STARTED",
"primary" : true,
"node" : "e8b9c9f0-0c07-4845-9c02-211a4dbf7ea6",
"relocating_node" : null,
"shard" : 0,
"index" : "twitter",
"version" : 4,
"token_ranges" : [ "(-9223372036854775808,9223372036854775807]" ],
"allocation_id" : {
"id" : "SdDlnqLXTuacrlHpaJkAwA"
}
} ]
}
}
}
}
{
"cluster_name" : "Test Cluster",
"version" : 7,
"state_uuid" : "SkMDaaB-RA6n0DhmHZaTow",
"master_node" : "e8b9c9f0-0c07-4845-9c02-211a4dbf7ea6",
"blocks" : { },
"nodes" : {
"e8b9c9f0-0c07-4845-9c02-211a4dbf7ea6" : {
"name" : "localhost",
"status" : "ALIVE",
"transport_address" : "127.0.0.1:9300",
"attributes" : {
"rack" : "rack1",
"data" : "true",
"data_center" : "dc1",
"master" : "true"
}
}
},
}
© DataStax, All Rights Reserved. 12
Elasticsearch mapping storage in Cassandra
Elasticsearch mapping is stored in :
• A Cassandra table elastic_admin.metadata
• In the internal cassandra system keyspace.
On node bootstrap (first start of a node)
Data are pulled from other nodes and are indexed in elasticsearch
=> Bootstrapping provides elasticsearch resharding.
On node startup :
Recovered data from commitlogs are indexed in elasticsearch.
=> This ensures consistency after a failure.
© DataStax, All Rights Reserved. 13
Masterless mapping management
When a node update the elasticsearch mapping :
• A PAXOS transaction on elastic_admin.metadata table
ensures no concurrent modification can be done.
• The GOSSIP protocol is used
• to notify all the nodes to reload the new mapping
• to check that all nodes have applied this new mapping
• to broadcast shards status.
=> No more elasticsearch master node
© DataStax, All Rights Reserved. 14
Cross Datacenter Replication
dc1 dc2
Elasticsearch mapping
and data replication
Elasticsearch cluster Elasticsearch cluster
Kibana Kibana
Cassandra Hinted Handoff and Repair ensures data consistency
© DataStax, All Rights Reserved. 15
Elassandra : Backup & Restore
Backup Elasticsearch Lucene files like Cassandra SSTables
• Cassandra flush memtables and secondary indices when snapshotting
• Lucene file are immutable like cassandra SSTables
• Snapshot = hard link on immutable SSTables + lucene files.
Benefits :
• Consistent backup of Cassandra and Elasticsearch indices
• Cassandra as a primary storage (No shared FS needed)
© DataStax, All Rights Reserved.
1 Introduction
2 How Elassandra works ?
3 Cool Features
4 Elassandra’s ecosystem
5 Roadmap
6 Q&A
16
© DataStax, All Rights Reserved.
PUT /twitter/tweet/1 {
"user" : "vince",
"post_date" : "2009-11-15T14:12:12",
"message" : "look at Elassandra",
"size": 50
}
17
Elassandra provides Bi-directionnal mapping
CREATE KEYSPACE twitter WITH …
CREATE TABLE twitter.tweet (
"_id" text PRIMARY KEY,
message list<text>,
post_date list<timestamp>,
size list<bigint>,
user list<text>
)
Inserting a document via elastic APIs creates/updates the underlying CQL schema
Discover the Elasticsearch mapping from an existing CQL schema
PUT /twitter/_mapping/tweet {
"discover" : ".*"
}
PUT /twitter/_mapping/tweet {
"twitter" : {
"properties" : {
"message" : {
"type" : "string"
},
"post_date" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"size" : {
"type" : "long"
},
"user" : {
"type" : "string"
}
}
}
}
© DataStax, All Rights Reserved. 18
Elassandra supports nested documents with UDT
Nested documents are stored in a Cassandra User Defined Type dynamically
generated from the Elasticsearch mapping.
curl -XPUT "http://$NODE:9200/directory/users/1" -d '{
"group" : "fans",
"name" : {
"first" : "John",
"last" : "Smith"
}
}'
CREATE KEYSPACE directory WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '1'} AND durable_writes = true;
CREATE TYPE directory.users_name (
last frozen<list<text>>,
first frozen<list<text>>
);
CREATE TABLE directory.users (
"_id" text PRIMARY KEY,
group list<text>,
name list<frozen<users_name>>
);
CREATE CUSTOM INDEX elastic_users_name_idx ON directory.users (name) USING 'org.elasticsearch.cassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_users_group_idx ON directory.users (group) USING 'org.elasticsearch.cassandra.index.ExtendedElasticSecondaryIndex';
© DataStax, All Rights Reserved. 19
Many elasticsearch indices for a keyspace
A keyspace content can be indexed in many elasticsearch indices with various
mappings.
Standard Cassandra index rebuild (use C* compaction manager threads) :
nodetool rebuild_index <keyspace> <tablename> elastic_<tablename>
Benefits : Change index mappings with zero downtime
© DataStax, All Rights Reserved. 20
Partitioned indices for logs analysis with Kibana
At index time, a partition function builds the target elasticsearch index name.
• Time-frame indices are removed when too old.
• A default TTL on the underlying C* tables removes old logs.
• Comes with a cost : duplicate lucene term dictionaries.
curl -XPUT "http://localhost:9200/logs_${YEAR}" -d '{
"settings":{
"keyspace":"logs",
"index.partition_function":"year logs_{0,date,yyyy} date_field" }
}
}’
curl -s -XGET http://localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open kibana 2 1 14 1 54.3kb 54.3kb
green open logs_2005 2 0 22770654 874988 2.8gb 2.8gb
green open logs_2006 2 0 93003294 5466480 12gb 12gb
green open logs_2007 2 0 118455836 4856867 15.1gb 15.1gb
green open logs_2008 2 0 131107405 5969785 16.8gb 16.8gb
green open logs_2009 2 0 58150615 1296827 7.4gb 7.4gb
green open logs_2010 2 0 23 2 86.8kb 86.8kb
green open logs_2011 2 0 0 0 142b 142b
© DataStax, All Rights Reserved.
1 Introduction
2 How Elassandra works ?
3 Cool Features
4 Elassandra’s ecosystem
5 Roadmap
6 Q&A
21
© DataStax, All Rights Reserved. 22
Elassandra + Kibana
Kibana for search and data visualization :
cqlsh> SELECT "_id",title,"kibanaSavedObjectMeta" FROM kibana.visualization;
_id | title | kibanaSavedObjectMeta
-------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------
lastfm-by-age | ['lastfm by age'] | [{searchSourceJSON: ['{"index":"lastfm_*","query":{"query_string":{"query":"*","analyze_wildcard":true}},"filter":[]}']}]
lastfm-histogram | ['lastfm histogram'] | [{searchSourceJSON: ['{"index":"lastfm_*","query":{"query_string":{"query":"*","analyze_wildcard":true}},"filter":[]}']}]
lastfm-by-country | ['lastfm by country'] | [{searchSourceJSON: ['{"index":"lastfm_*","query":{"query_string":{"analyze_wildcard":true,"query":"*"}},"filter":[]}']}]
lastfm-count | ['lastfm count'] | [{searchSourceJSON: ['{"index":"lastfm_*","query":{"query_string":{"query":"*","analyze_wildcard":true}},"filter":[]}']}]
Even Kibana’s
configuration is in
Cassandra
© DataStax, All Rights Reserved. 23
Elassandra tools & plugins
Elassandra supports Elasticsearch tools & plugins
• Logstash & Beats
• ElasticHQ (royrusso)
• Elasticsearch-sql (NLPchina)
• JDBC sql4es (Anchormen)
© DataStax, All Rights Reserved. 24
Elassandra + Kibana + PrestoDB + CQLInject
Denormalizing our dataset for visualization with kibana
cqlinject jdbc "SELECT A.a, B.b FROM A INNER JOIN B on A.c=B.c"
1. Execute a JDBC request on prestoDB.
2. From the response metadata, add new columns to the target C* table.
3. Refresh the Elasticsearch mapping from C* table.
4. Write back the result to Elassandra.
worker
worker
worker
worker
worker
worker
Kibanacqlinject E*+PrestoDB
© DataStax, All Rights Reserved. 25
Elassandra + Spark
Principle : The Elasticsearch-Hadoop connector creates 1 partition
per shard whereas Elassandra has only 1 shard on each node.
Benefits :
• Workers/executors read/write locally on Elassandra nodes.
• Elassandra resharding functionality allows to scale out cassandra
+elasticsearch+spark
• The elasticsearch-Spark connector supports pushdown
How :
A slight modification in elasticsearch-hadoop connector to add
token_ranges filter from the coordinator routing table to avoid
duplicates if nodes have overlapping routing tables.
executor
executor
executor
executor
executor
executor
E* + Spark
© DataStax, All Rights Reserved.
Time Series with Elassandra
Storing large time series in Cassandra
• N tables with different levels of precision and retention
• Daily rollup batches on each node to aggregate local data and compute metadata (min/max/avg/stdev….)
• Automatic purge with default TTL + DateTieredCompactionStrategy
Searching with only index metric names and metadata
• Metadata enrichment by joining other sources of data (ex: datacenters, applications, hardware info….)
• Search with regex on any metadata to display relevant time series
26
E*Data Injection Grapher
Web browser
search
display
Local Daily
Batchs
© DataStax, All Rights Reserved. 27
Write throughput
Cassandra = 30k write/s Elassandra = 30k write/s
• Write Throughput is the same if your node is not overloaded
• CPU x2 for Elassandra
• (#threads + #classes) X2 for Elassandra
2 nodes cluster, RF=1, Google Cloud VM n1-highcpu-16 (16 vCPU - 14,4 Go mem)
© DataStax, All Rights Reserved.
1 Introduction
2 How Elassandra works ?
3 Cool Features
4 Elassandra’s ecosystem
5 Roadmap
6 Q&A
28
© DataStax, All Rights Reserved.
Elassandra Roadmap
Make it a deployed enterprise grade solution:
• Improve the documentation and packaging
• Implement Elasticsearch missing features
• Upgrade to Cassandra 3.0.<lastest> and Elasticsearch 2.<lastest>
• Make it ready for Windows OS
• Provide security features (SSL, LDAP, document and field level security)
• Deliver professional services
29
© DataStax, All Rights Reserved.
More about us …
30
http://www.elassandra.io
https://github.com/vroyer/elassandra
Vincent Royer
Elassandra Author
vroyer@strapdata.com
Rémi Trouville
Strapdata Co-Founder
rtrouville@strapdata.com
© DataStax, All Rights Reserved.
1 Introduction
2 How Elassandra works ?
3 Features
4 Use case examples
5 Roadmap
6 Q&A
31
Thank you

Contenu connexe

Tendances

https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...MongoDB
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentHostedbyConfluent
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...aiuy
 
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...NETWAYS
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureHortonworks
 
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能Ryota Watabe
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Databricks
 
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...Spark Summit
 
Introduction to Greenplum
Introduction to GreenplumIntroduction to Greenplum
Introduction to GreenplumDave Cramer
 
Extreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGateExtreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGateBobby Curtis
 
JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011Kris Mok
 
오픈소스 모니터링 알아보기(Learn about opensource monitoring)
오픈소스 모니터링 알아보기(Learn about opensource monitoring)오픈소스 모니터링 알아보기(Learn about opensource monitoring)
오픈소스 모니터링 알아보기(Learn about opensource monitoring)SeungYong Baek
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 

Tendances (20)

Fluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at ScaleFluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at Scale
 
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
 
Introduction to Greenplum
Introduction to GreenplumIntroduction to Greenplum
Introduction to Greenplum
 
FASTER Key-Value Store and Log
FASTER Key-Value Store and LogFASTER Key-Value Store and Log
FASTER Key-Value Store and Log
 
Extreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGateExtreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGate
 
JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011
 
오픈소스 모니터링 알아보기(Learn about opensource monitoring)
오픈소스 모니터링 알아보기(Learn about opensource monitoring)오픈소스 모니터링 알아보기(Learn about opensource monitoring)
오픈소스 모니터링 알아보기(Learn about opensource monitoring)
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 

Similaire à Elasticsearch as Cassandra Secondary Index

Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Vincent Royer
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3uzzal basak
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparisonshsedghi
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksDatabricks
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixRoopa Tangirala
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBJanos Geronimo
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack IntroductionVikram Shinde
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks
 

Similaire à Elasticsearch as Cassandra Secondary Index (20)

Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 
Multi-cluster k8ssandra
Multi-cluster k8ssandraMulti-cluster k8ssandra
Multi-cluster k8ssandra
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Presentation
PresentationPresentation
Presentation
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 

Plus de DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 

Plus de DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Dernier

Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 

Dernier (20)

Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 

Elasticsearch as Cassandra Secondary Index

  • 1. Elassandra : Elasticsearch as a Cassandra Secondary Index September the 8th, 2016 - LLD20
  • 2. © DataStax, All Rights Reserved. Elassandra : Elasticsearch as a Cassandra secondary index 2 Vincent Royer Elassandra Author Rémi Trouville Strapdata Co-Founder We have been working together for 8 years in the banking/insurance industry Today’s objectives : • Sharing our vision and excitement about our project • Receiving feedback from you all about elassandra • Meeting NOSQL gurus to exchange ideas, solutions, questions, … and beers &
  • 3. © DataStax, All Rights Reserved. 1 Introduction 2 How Elassandra works ? 3 Cool Features 4 Elassandra’s ecosystem 5 Roadmap 6 Q&A 3
  • 4. © DataStax, All Rights Reserved. 4 Elassandra’s status (2016/09/08) Current Usage • Used for non-critical data including • Application logs • Server monitoring (CPU, memory…) • Consolidation and reporting from various SQL databases. Current status of Elassandra • Still in beta version • Needs testing on larger deployments Production-ready targeted End of 2016
  • 5. © DataStax, All Rights Reserved. 1 Introduction 2 How Elassandra works ? 3 Cool Features 4 Elassandra’s ecosystem 5 Roadmap 6 Q&A 5
  • 6. Write operations © DataStax, All Rights Reserved. • The master node manages and broadcasts the cluster state • Only primary nodes supports write operations • On failure, a new master or primary node is elected • By default, 5 shards per index 6 Elasticsearch design Master Node Primary Node Primary Node Primary Node Replica Node Replica Node Replica Node Replica Node Replica Node Replica Node Read operations
  • 7. © DataStax, All Rights Reserved. • Elasticsearch code is embedded in Cassandra nodes • Documents are stored as row in a Cassandra tables (no more _source in Elasticsearch) • A custom secondary index synchronously updates elasticsearch indices 7 Elassandra design
  • 8. Terminology Elasticsearch Cassandra Description Mapping Schema Defines data structures Cluster Virtual Datacenter An elassandra datacenter is an elastic search cluster Index Keyspace An index relies on a keyspace. Type Table Each document type is stored in a cassandra table Document Row A document is stored as a cassandra row where _id is the primary key. Field Column Each indexed field is backed by a cassandra column Object or nested fields User Defined Type Automatically created User Defined Type to store elasticsearch objects.
  • 9. © DataStax, All Rights Reserved. Elassandra Write Path 9 shard Secondary index Secondary index REST index a document shard Dynamic Mapping Update node 1 (coordinator) node 2 node 3 Elasticsearch Layer Cassandra Layer INSERT INTO (…) VALUES (…) index a document including _token index a document including _token
  • 10. © DataStax, All Rights Reserved. Elassandra Search Path 10 shard Secondary index Secondary index REST search shard node 1 (coordinator) node 2 node 3 Elasticsearch Layer Cassandra Layer SELECT fields FROM index.type WHERE PK=_id Secondary index shard Search phase with _token filter to avoid duplicates Fetch phase Response
  • 11. © DataStax, All Rights Reserved. Elasticsearch cluster state Cluster state has 3 main sections : 1. Cluster information (cluster name, node ids) 2. Metadata (mapping definition, indices and data structures, stored in a Cassandra) 3. Routing table to route search operations (Built locally from the Cassandra topology) 11 "metadata" : { "version" : 2, "cluster_uuid" : "e8b9c9f0-0c07-4845-9c02-211a4dbf7ea6", "templates" : { }, "indices" : { "twitter" : { "state" : "open", "settings" : { "index" : { "creation_date" : "1471681453347", "number_of_shards" : "1", "number_of_replicas" : "0", "uuid" : "j4zZS2eOTHaDcW3r1e_1DA", "version" : { "created" : "2010199" } } }, "mappings" : { "tweet" : { "properties" : { "size" : { "type" : "long"}, "post_date" : { "format" : "strict_date_optional_time||epoch_millis", "type" : "date" }, "message" : {"type" : "string"}, "user" : {"type" : « string" } } } } "routing_table" : { "indices" : { "twitter" : { "shards" : { "0" : [ { "state" : "STARTED", "primary" : true, "node" : "e8b9c9f0-0c07-4845-9c02-211a4dbf7ea6", "relocating_node" : null, "shard" : 0, "index" : "twitter", "version" : 4, "token_ranges" : [ "(-9223372036854775808,9223372036854775807]" ], "allocation_id" : { "id" : "SdDlnqLXTuacrlHpaJkAwA" } } ] } } } } { "cluster_name" : "Test Cluster", "version" : 7, "state_uuid" : "SkMDaaB-RA6n0DhmHZaTow", "master_node" : "e8b9c9f0-0c07-4845-9c02-211a4dbf7ea6", "blocks" : { }, "nodes" : { "e8b9c9f0-0c07-4845-9c02-211a4dbf7ea6" : { "name" : "localhost", "status" : "ALIVE", "transport_address" : "127.0.0.1:9300", "attributes" : { "rack" : "rack1", "data" : "true", "data_center" : "dc1", "master" : "true" } } }, }
  • 12. © DataStax, All Rights Reserved. 12 Elasticsearch mapping storage in Cassandra Elasticsearch mapping is stored in : • A Cassandra table elastic_admin.metadata • In the internal cassandra system keyspace. On node bootstrap (first start of a node) Data are pulled from other nodes and are indexed in elasticsearch => Bootstrapping provides elasticsearch resharding. On node startup : Recovered data from commitlogs are indexed in elasticsearch. => This ensures consistency after a failure.
  • 13. © DataStax, All Rights Reserved. 13 Masterless mapping management When a node update the elasticsearch mapping : • A PAXOS transaction on elastic_admin.metadata table ensures no concurrent modification can be done. • The GOSSIP protocol is used • to notify all the nodes to reload the new mapping • to check that all nodes have applied this new mapping • to broadcast shards status. => No more elasticsearch master node
  • 14. © DataStax, All Rights Reserved. 14 Cross Datacenter Replication dc1 dc2 Elasticsearch mapping and data replication Elasticsearch cluster Elasticsearch cluster Kibana Kibana Cassandra Hinted Handoff and Repair ensures data consistency
  • 15. © DataStax, All Rights Reserved. 15 Elassandra : Backup & Restore Backup Elasticsearch Lucene files like Cassandra SSTables • Cassandra flush memtables and secondary indices when snapshotting • Lucene file are immutable like cassandra SSTables • Snapshot = hard link on immutable SSTables + lucene files. Benefits : • Consistent backup of Cassandra and Elasticsearch indices • Cassandra as a primary storage (No shared FS needed)
  • 16. © DataStax, All Rights Reserved. 1 Introduction 2 How Elassandra works ? 3 Cool Features 4 Elassandra’s ecosystem 5 Roadmap 6 Q&A 16
  • 17. © DataStax, All Rights Reserved. PUT /twitter/tweet/1 { "user" : "vince", "post_date" : "2009-11-15T14:12:12", "message" : "look at Elassandra", "size": 50 } 17 Elassandra provides Bi-directionnal mapping CREATE KEYSPACE twitter WITH … CREATE TABLE twitter.tweet ( "_id" text PRIMARY KEY, message list<text>, post_date list<timestamp>, size list<bigint>, user list<text> ) Inserting a document via elastic APIs creates/updates the underlying CQL schema Discover the Elasticsearch mapping from an existing CQL schema PUT /twitter/_mapping/tweet { "discover" : ".*" } PUT /twitter/_mapping/tweet { "twitter" : { "properties" : { "message" : { "type" : "string" }, "post_date" : { "type" : "date", "format" : "strict_date_optional_time||epoch_millis" }, "size" : { "type" : "long" }, "user" : { "type" : "string" } } } }
  • 18. © DataStax, All Rights Reserved. 18 Elassandra supports nested documents with UDT Nested documents are stored in a Cassandra User Defined Type dynamically generated from the Elasticsearch mapping. curl -XPUT "http://$NODE:9200/directory/users/1" -d '{ "group" : "fans", "name" : { "first" : "John", "last" : "Smith" } }' CREATE KEYSPACE directory WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '1'} AND durable_writes = true; CREATE TYPE directory.users_name ( last frozen<list<text>>, first frozen<list<text>> ); CREATE TABLE directory.users ( "_id" text PRIMARY KEY, group list<text>, name list<frozen<users_name>> ); CREATE CUSTOM INDEX elastic_users_name_idx ON directory.users (name) USING 'org.elasticsearch.cassandra.index.ExtendedElasticSecondaryIndex'; CREATE CUSTOM INDEX elastic_users_group_idx ON directory.users (group) USING 'org.elasticsearch.cassandra.index.ExtendedElasticSecondaryIndex';
  • 19. © DataStax, All Rights Reserved. 19 Many elasticsearch indices for a keyspace A keyspace content can be indexed in many elasticsearch indices with various mappings. Standard Cassandra index rebuild (use C* compaction manager threads) : nodetool rebuild_index <keyspace> <tablename> elastic_<tablename> Benefits : Change index mappings with zero downtime
  • 20. © DataStax, All Rights Reserved. 20 Partitioned indices for logs analysis with Kibana At index time, a partition function builds the target elasticsearch index name. • Time-frame indices are removed when too old. • A default TTL on the underlying C* tables removes old logs. • Comes with a cost : duplicate lucene term dictionaries. curl -XPUT "http://localhost:9200/logs_${YEAR}" -d '{ "settings":{ "keyspace":"logs", "index.partition_function":"year logs_{0,date,yyyy} date_field" } } }’ curl -s -XGET http://localhost:9200/_cat/indices?v health status index pri rep docs.count docs.deleted store.size pri.store.size green open kibana 2 1 14 1 54.3kb 54.3kb green open logs_2005 2 0 22770654 874988 2.8gb 2.8gb green open logs_2006 2 0 93003294 5466480 12gb 12gb green open logs_2007 2 0 118455836 4856867 15.1gb 15.1gb green open logs_2008 2 0 131107405 5969785 16.8gb 16.8gb green open logs_2009 2 0 58150615 1296827 7.4gb 7.4gb green open logs_2010 2 0 23 2 86.8kb 86.8kb green open logs_2011 2 0 0 0 142b 142b
  • 21. © DataStax, All Rights Reserved. 1 Introduction 2 How Elassandra works ? 3 Cool Features 4 Elassandra’s ecosystem 5 Roadmap 6 Q&A 21
  • 22. © DataStax, All Rights Reserved. 22 Elassandra + Kibana Kibana for search and data visualization : cqlsh> SELECT "_id",title,"kibanaSavedObjectMeta" FROM kibana.visualization; _id | title | kibanaSavedObjectMeta -------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------- lastfm-by-age | ['lastfm by age'] | [{searchSourceJSON: ['{"index":"lastfm_*","query":{"query_string":{"query":"*","analyze_wildcard":true}},"filter":[]}']}] lastfm-histogram | ['lastfm histogram'] | [{searchSourceJSON: ['{"index":"lastfm_*","query":{"query_string":{"query":"*","analyze_wildcard":true}},"filter":[]}']}] lastfm-by-country | ['lastfm by country'] | [{searchSourceJSON: ['{"index":"lastfm_*","query":{"query_string":{"analyze_wildcard":true,"query":"*"}},"filter":[]}']}] lastfm-count | ['lastfm count'] | [{searchSourceJSON: ['{"index":"lastfm_*","query":{"query_string":{"query":"*","analyze_wildcard":true}},"filter":[]}']}] Even Kibana’s configuration is in Cassandra
  • 23. © DataStax, All Rights Reserved. 23 Elassandra tools & plugins Elassandra supports Elasticsearch tools & plugins • Logstash & Beats • ElasticHQ (royrusso) • Elasticsearch-sql (NLPchina) • JDBC sql4es (Anchormen)
  • 24. © DataStax, All Rights Reserved. 24 Elassandra + Kibana + PrestoDB + CQLInject Denormalizing our dataset for visualization with kibana cqlinject jdbc "SELECT A.a, B.b FROM A INNER JOIN B on A.c=B.c" 1. Execute a JDBC request on prestoDB. 2. From the response metadata, add new columns to the target C* table. 3. Refresh the Elasticsearch mapping from C* table. 4. Write back the result to Elassandra. worker worker worker worker worker worker Kibanacqlinject E*+PrestoDB
  • 25. © DataStax, All Rights Reserved. 25 Elassandra + Spark Principle : The Elasticsearch-Hadoop connector creates 1 partition per shard whereas Elassandra has only 1 shard on each node. Benefits : • Workers/executors read/write locally on Elassandra nodes. • Elassandra resharding functionality allows to scale out cassandra +elasticsearch+spark • The elasticsearch-Spark connector supports pushdown How : A slight modification in elasticsearch-hadoop connector to add token_ranges filter from the coordinator routing table to avoid duplicates if nodes have overlapping routing tables. executor executor executor executor executor executor E* + Spark
  • 26. © DataStax, All Rights Reserved. Time Series with Elassandra Storing large time series in Cassandra • N tables with different levels of precision and retention • Daily rollup batches on each node to aggregate local data and compute metadata (min/max/avg/stdev….) • Automatic purge with default TTL + DateTieredCompactionStrategy Searching with only index metric names and metadata • Metadata enrichment by joining other sources of data (ex: datacenters, applications, hardware info….) • Search with regex on any metadata to display relevant time series 26 E*Data Injection Grapher Web browser search display Local Daily Batchs
  • 27. © DataStax, All Rights Reserved. 27 Write throughput Cassandra = 30k write/s Elassandra = 30k write/s • Write Throughput is the same if your node is not overloaded • CPU x2 for Elassandra • (#threads + #classes) X2 for Elassandra 2 nodes cluster, RF=1, Google Cloud VM n1-highcpu-16 (16 vCPU - 14,4 Go mem)
  • 28. © DataStax, All Rights Reserved. 1 Introduction 2 How Elassandra works ? 3 Cool Features 4 Elassandra’s ecosystem 5 Roadmap 6 Q&A 28
  • 29. © DataStax, All Rights Reserved. Elassandra Roadmap Make it a deployed enterprise grade solution: • Improve the documentation and packaging • Implement Elasticsearch missing features • Upgrade to Cassandra 3.0.<lastest> and Elasticsearch 2.<lastest> • Make it ready for Windows OS • Provide security features (SSL, LDAP, document and field level security) • Deliver professional services 29
  • 30. © DataStax, All Rights Reserved. More about us … 30 http://www.elassandra.io https://github.com/vroyer/elassandra Vincent Royer Elassandra Author vroyer@strapdata.com Rémi Trouville Strapdata Co-Founder rtrouville@strapdata.com
  • 31. © DataStax, All Rights Reserved. 1 Introduction 2 How Elassandra works ? 3 Features 4 Use case examples 5 Roadmap 6 Q&A 31