SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Cassandra 2.0
!
!

Michael Shaler!
Senior Director, Applications Business

©2013 DataStax Confidential. Do not distribute without consent.
Open Source Database Pedigrees
MemcacheDB
Azure Table Services

Key Value
Stores

Redis
Tokyo Cabinet
SimpleDB
Riak

Amazon
Dynamo

Voldemort

Google
BigTable

Cassandra
Hbase
Hypertable
CouchDB

Document DB

UserGrid

MongoDB

JSON/XML DB
Neo4J
Graph
Databases

TitanDB

FlockDB

* Courtesy of @GuyHarrison
Common Use Cases

cassandra
•

Web product searches !

•

Internal document search (law firms, etc.)!

•

Real estate/property searches !

•

Social media match ups!

•

Web & application log management / analysis

•

Big data OLTP and write intensive systems!

•

Time series data management!

•

High velocity device data consumption and analysis!

•

Healthcare systems input and analysis!

•

Media streaming (music, movies, etc.)!

•

Social media input and analysis!

•

Online Web retail (shopping carts, user transactions, etc.)!

•

Web click-stream analysis !

•

Online gaming (real-time messaging, etc.)

•

Buyer event and behavior analytics!

•

Real time data analytics

•

Fraud detection and analysis!

•

Risk analysis and management !

•

Supply chain analytics !

!

3
Cassandra as Foundation
Benefit

Feature

Fully Distributed: no SPOF

Peer-to-peer architecture for continuous availability and
operational simplicity

Multi-Datacenter

Node-, rack- and DC-aware with tunable consistency

Massively Scalable

Multiple customers > 10M writes/second

SSD and Cloud optimized

All writes are linear, and all files are immutable

Rich Application Data Model

CQL (no joins or 2PC), integration with ODBC/JDBC et al
What’s New in Cassandra 2.0
• Lightweight Transactions!
-IF keyword in CQL INSERT and UPDATE statements!
-SERIAL consistency level!

• Triggers!
-Phase 1 support!

• CQL paging support!
• Prepared statement support: Atomic BATCH guarantees!
• Bind variable support!
• Improved authentication via SASL!
• Drop column support (ALTER TABLE DROP)!
• SELECT column aliases!
• Conditional DDL!
• Index enhancements!
• One-off prepare and execute statements!
• Performance enhancements!
-Off-heap partition summary!
-Eager retries support!
-Compaction improvements

5
C* 2.0 Feature Highlights
• Lightweight transactions !
• Triggers (experimental)!
• Improved compaction!
• CQL cursors

6
The Problem: Sometimes we need Serializablity

Session 1

Session 2

SELECT * FROM users	
WHERE username = ’jbellis’	
[empty resultset]	
INSERT INTO users (...)	
VALUES (’jbellis’, ...)

SELECT * FROM users	
WHERE username = ’jbellis’	
[empty resultset]	
INSERT INTO users (...)	
VALUES (’jbellis’, ...)

	


	


7
LWT: Details

•
•
•
•
•

4 round trips vs 1 for normal updates
Paxos state is durable
Immediate consistency with no leader
election or failover
ConsistencyLevel.SERIAL
http://www.datastax.com/dev/blog/
lightweight-transactions-in- cassandra-2-0
	


•

8
Paxos for LWT

9
LWT: Use sparingly, with caution

•
•

Great for 1% of your application
Eventual consistency is your
friend
	


•

http://www.slideshare.net/
planetcassandra/c-summit-2013eventual-consistency- hopefulconsistency-by-christos-kalantzis
	


•

10
LWT example
INSERT INTO USERS (username, email, ...)	
VALUES (‘jbellis’,
‘jbellis@datastax.com’, ... )	
IF NOT EXISTS;	
!

UPDATE USERS	
SET email = ’jonathan@datastax.com’, ...	
WHERE username = ’jbellis’	
IF email = ’jbellis@datastax.com’;

11
Some fine print…:)
• The columns updated do NOT have to be the same as the columns in the IF
clause.
• Lightweight transactions are restricted to a single partition; this is the
granularity at which we keep the internal Paxos state. As a corollary,
transactions in different partitions will never interrupt each other.
• If your transaction fails because the existing values did not match the one you
expected, Cassandra will include the current ones so you can decide whether
to retry or abort without needing to make an extra request.
• ConsistencyLevel.SERIAL has been added to allow reading the current
(possibly un-committed) Paxos state without having to propose a new update.
If a SERIAL read finds an uncommitted update in progress, it will commit it as
part of the read.
• For details of how we deal with failures, see the comments and code.
• Tickets for DC-local transactions, updating multiple rows in a transaction, and
cqlsh support for retrieving the current value from an interrupted transaction
are open and will be fixed for 2.0.1.

12
Triggers
CREATE TRIGGER <name> ON <table> 

USING <classname>;
	


!

class MyTrigger implements ITrigger
{

	
	

public Collection<RowMutation> 

augment(ByteBuffer key, ColumnFamily update)

	

{

	

	


} 	

	


}

	


...

	


	


13
Triggers are EXPERIMENTAL!

•

•
•

Relies on internal
RowMutation,
ColumnFamily classes
[partition] key is a ByteBuffer
Expect changes in 2.1
	


•

14
Compaction

• Single-pass, always

• LCS performs STCS in L0

15
Healthy Leveled Compaction

ealthy leveled compaction

16
ad leveled compaction
Sad Leveled Compaction

17
STCS in L0

STCS in L0

18
Cursors (before)
!
CREATE TABLE timeline (	
user_id uuid,	
tweet_id timeuuid,	
tweet_author uuid,	
tweet_body text,	
PRIMARY KEY (user_id, tweet_id)	
	 );	

!

SELECT *

	 FROM timeline

	 WHERE (user_id = :last_key

	


	

AND tweet_id > :last_tweet)

OR token(user_id) > token(:last_key)

	


LIMIT 100

	


	


	


•

19
Cursors (before)
!
CREATE TABLE timeline (	
user_id uuid,	
tweet_id timeuuid,	
tweet_author uuid,	
tweet_body text,	
PRIMARY KEY (user_id, tweet_id)	
	 );	

!

SELECT *

	 FROM timeline

	 WHERE (user_id = :last_key

	


	

AND tweet_id > :last_tweet)

OR token(user_id) > token(:last_key)

	


LIMIT 100

	


	


	


•

20
Other Performance Improvements
	 •	 Tracking statistics on clustered columns allows eliminating
unnecessary sstables from the read path
	 •	 New half-synchronous, half-asynchronous Thrift server
based on LMAX Disruptor
	 •	 Faster partition index lookups and cache reads by improving
performance of off-heap memory
	 •	 Faster reads of compressed data by switching from CRC32
to Adler checksums
	 •	 JEMalloc support for off-heap allocation

21
Clean-up
	 •	 Removed compatibility with pre-1.2.5 sstables and pre-1.2.9
schema
	 •	 The potentially dangerous countPendingHints JMX call has been
replaced by a Hints Created metric
	 •	 The on-heap partition cache (“row cache”) has been removed
	 •	 Vnodes are on by default
	 	

The old token range bisection code for non-vnode clusters is gone

	 •	 Removed emergency memory pressure valve logic

22
Operational Considerations
	 •	 Java7 is now required!
	 •	 Leveled compaction level information has been moved
into sstable metadata
	 •	 Kernel page cache skipping has been removed in favor
of optional row preheating (preheat_kernel_page_cache)
	 •	 Streaming has been rewritten to be more transparent
and robust.
	 •	 Streaming support for old-version sstables

23
Questions?!
!

Answers?
24

Contenu connexe

Similaire à October 2013 Cassandra Boulder MeetUp.key

MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, whenEugenio Minardi
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015Bipin Singh
 
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...DataStax Academy
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQLGrant Fritchey
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorialmubarakss
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWSSungmin Kim
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Future of Data Meetup
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data IntegrationsPat Patterson
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_uploadRajini Ramesh
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streamsunivalence
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
React Fast by Processing Streaming Data in Real-Time
React Fast by Processing Streaming Data in Real-TimeReact Fast by Processing Streaming Data in Real-Time
React Fast by Processing Streaming Data in Real-TimeAmazon Web Services
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data WarehousingThomas Kejser
 

Similaire à October 2013 Cassandra Boulder MeetUp.key (20)

MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
 
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_upload
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streams
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
React Fast by Processing Streaming Data in Real-Time
React Fast by Processing Streaming Data in Real-TimeReact Fast by Processing Streaming Data in Real-Time
React Fast by Processing Streaming Data in Real-Time
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 

Dernier

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

October 2013 Cassandra Boulder MeetUp.key

  • 1. Cassandra 2.0 ! ! Michael Shaler! Senior Director, Applications Business ©2013 DataStax Confidential. Do not distribute without consent.
  • 2. Open Source Database Pedigrees MemcacheDB Azure Table Services Key Value Stores Redis Tokyo Cabinet SimpleDB Riak Amazon Dynamo Voldemort Google BigTable Cassandra Hbase Hypertable CouchDB Document DB UserGrid MongoDB JSON/XML DB Neo4J Graph Databases TitanDB FlockDB * Courtesy of @GuyHarrison
  • 3. Common Use Cases cassandra • Web product searches ! • Internal document search (law firms, etc.)! • Real estate/property searches ! • Social media match ups! • Web & application log management / analysis • Big data OLTP and write intensive systems! • Time series data management! • High velocity device data consumption and analysis! • Healthcare systems input and analysis! • Media streaming (music, movies, etc.)! • Social media input and analysis! • Online Web retail (shopping carts, user transactions, etc.)! • Web click-stream analysis ! • Online gaming (real-time messaging, etc.) • Buyer event and behavior analytics! • Real time data analytics • Fraud detection and analysis! • Risk analysis and management ! • Supply chain analytics ! ! 3
  • 4. Cassandra as Foundation Benefit Feature Fully Distributed: no SPOF Peer-to-peer architecture for continuous availability and operational simplicity Multi-Datacenter Node-, rack- and DC-aware with tunable consistency Massively Scalable Multiple customers > 10M writes/second SSD and Cloud optimized All writes are linear, and all files are immutable Rich Application Data Model CQL (no joins or 2PC), integration with ODBC/JDBC et al
  • 5. What’s New in Cassandra 2.0 • Lightweight Transactions! -IF keyword in CQL INSERT and UPDATE statements! -SERIAL consistency level! • Triggers! -Phase 1 support! • CQL paging support! • Prepared statement support: Atomic BATCH guarantees! • Bind variable support! • Improved authentication via SASL! • Drop column support (ALTER TABLE DROP)! • SELECT column aliases! • Conditional DDL! • Index enhancements! • One-off prepare and execute statements! • Performance enhancements! -Off-heap partition summary! -Eager retries support! -Compaction improvements 5
  • 6. C* 2.0 Feature Highlights • Lightweight transactions ! • Triggers (experimental)! • Improved compaction! • CQL cursors 6
  • 7. The Problem: Sometimes we need Serializablity Session 1 Session 2 SELECT * FROM users WHERE username = ’jbellis’ [empty resultset] INSERT INTO users (...) VALUES (’jbellis’, ...) SELECT * FROM users WHERE username = ’jbellis’ [empty resultset] INSERT INTO users (...) VALUES (’jbellis’, ...) 7
  • 8. LWT: Details • • • • • 4 round trips vs 1 for normal updates Paxos state is durable Immediate consistency with no leader election or failover ConsistencyLevel.SERIAL http://www.datastax.com/dev/blog/ lightweight-transactions-in- cassandra-2-0 • 8
  • 10. LWT: Use sparingly, with caution • • Great for 1% of your application Eventual consistency is your friend • http://www.slideshare.net/ planetcassandra/c-summit-2013eventual-consistency- hopefulconsistency-by-christos-kalantzis • 10
  • 11. LWT example INSERT INTO USERS (username, email, ...) VALUES (‘jbellis’, ‘jbellis@datastax.com’, ... ) IF NOT EXISTS; ! UPDATE USERS SET email = ’jonathan@datastax.com’, ... WHERE username = ’jbellis’ IF email = ’jbellis@datastax.com’; 11
  • 12. Some fine print…:) • The columns updated do NOT have to be the same as the columns in the IF clause. • Lightweight transactions are restricted to a single partition; this is the granularity at which we keep the internal Paxos state. As a corollary, transactions in different partitions will never interrupt each other. • If your transaction fails because the existing values did not match the one you expected, Cassandra will include the current ones so you can decide whether to retry or abort without needing to make an extra request. • ConsistencyLevel.SERIAL has been added to allow reading the current (possibly un-committed) Paxos state without having to propose a new update. If a SERIAL read finds an uncommitted update in progress, it will commit it as part of the read. • For details of how we deal with failures, see the comments and code. • Tickets for DC-local transactions, updating multiple rows in a transaction, and cqlsh support for retrieving the current value from an interrupted transaction are open and will be fixed for 2.0.1. 12
  • 13. Triggers CREATE TRIGGER <name> ON <table> 
 USING <classname>; ! class MyTrigger implements ITrigger {
 public Collection<RowMutation> 
 augment(ByteBuffer key, ColumnFamily update) {
 } } ... 13
  • 14. Triggers are EXPERIMENTAL! • • • Relies on internal RowMutation, ColumnFamily classes [partition] key is a ByteBuffer Expect changes in 2.1 • 14
  • 15. Compaction • Single-pass, always
 • LCS performs STCS in L0 15
  • 16. Healthy Leveled Compaction ealthy leveled compaction 16
  • 17. ad leveled compaction Sad Leveled Compaction 17
  • 18. STCS in L0 STCS in L0 18
  • 19. Cursors (before) ! CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid, tweet_author uuid, tweet_body text, PRIMARY KEY (user_id, tweet_id) ); ! SELECT *
 FROM timeline
 WHERE (user_id = :last_key AND tweet_id > :last_tweet)
 OR token(user_id) > token(:last_key) LIMIT 100 • 19
  • 20. Cursors (before) ! CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid, tweet_author uuid, tweet_body text, PRIMARY KEY (user_id, tweet_id) ); ! SELECT *
 FROM timeline
 WHERE (user_id = :last_key AND tweet_id > :last_tweet)
 OR token(user_id) > token(:last_key) LIMIT 100 • 20
  • 21. Other Performance Improvements • Tracking statistics on clustered columns allows eliminating unnecessary sstables from the read path • New half-synchronous, half-asynchronous Thrift server based on LMAX Disruptor • Faster partition index lookups and cache reads by improving performance of off-heap memory • Faster reads of compressed data by switching from CRC32 to Adler checksums • JEMalloc support for off-heap allocation 21
  • 22. Clean-up • Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema • The potentially dangerous countPendingHints JMX call has been replaced by a Hints Created metric • The on-heap partition cache (“row cache”) has been removed • Vnodes are on by default The old token range bisection code for non-vnode clusters is gone • Removed emergency memory pressure valve logic 22
  • 23. Operational Considerations • Java7 is now required! • Leveled compaction level information has been moved into sstable metadata • Kernel page cache skipping has been removed in favor of optional row preheating (preheat_kernel_page_cache) • Streaming has been rewritten to be more transparent and robust. • Streaming support for old-version sstables 23