SlideShare une entreprise Scribd logo
1  sur  42
NOSQL INTRO & CASSANDRA

1
REQUISITE SLIDE – WHO AM I?
-

Brian Enochson
- Home is the Jersey Shore
- SW Engineer who has worked as designer / developer on NOSQL
(Mongo, Cassandra)
- Consultant – HBO, ACS, CIBER
- Specialize in SW Development, architecture and training

Brian Enochson

brian.enochson@gmail.com
Available for training, consulting, architecture & development.

NOSQL INTRO & CASSANDRA

2
REQUISITE SLIDE # 2 – WHAT ARE WE TALKING
ABOUT?
•
•
•
•
•
•

NoSQL Introduction
What brought us here
Types of NoSQL Products
What about Hadoop?
What about Real-Time?
Quick look at MongoDB

•
•
•
•

Cassandra Intro & Architecture
Why Cassandra
Architecture
Internals
Development

•
•
•
•
•
•

Data Modeling Concepts
Old vs. New Way
Basics
Composite Types
Collections
Time Series Data
Counters

•

•

NOSQL INTRO & CASSANDRA

3
HISTORY OF THE DATABASE
•

1960’s – Hierarchical and Network type (IMS and CODASYL)

•

1970’s – Beginnings of theory behind relational model. Codd

•

1980’s – Rise of the relational model. SQL. E/R Model (Chen)

•

1990’s – Access/Excel and MySQL. ODMS began to appear

•

2000;’s – Two forces; large enterprise and open source. Google and Amazon.
CAP Theorem (more on that to come…)

•

2010’s – Immergence of NoSQL as an industry player and viable alternative

NOSQL INTRO & CASSANDRA

4
WHY WERE ALTERNATIVES NEEDED
•

Developers today are faced with Internet scale
• 100,000’s of users
• Low cost of storage
• Increased processing power
• Ability to capture (and need) of millions of events. Caching solves it to an
extent but brings other complexities
• Real-time
• Need to scale out and not up. (add infinite number of low cost machines, vs.
add a more powerful machine).

•

Cost
• Let’s not forget for enterprise DB’s Internet scale can become expensive
• Open source DB’s may solve license cost, but don’t ignore operational costs

NOSQL INTRO & CASSANDRA

5
A LOT OF DATA
Some facts from http://www.storagenewsletter.com/rubriques/marketreportsresearch/ibm-cmo-study/
Approximately 90 percent of all the real-time information being created today is
unstructured data
Every day we create 2.5 quintillion (10 to the 18th) bytes of data (this is 30
zeroes!!)

90 percent of the world's data today has been created in the last two years alone

NOSQL INTRO & CASSANDRA

6
RELATIONAL VS. NOSQL
• Relational
• Divide into tables, relate into foreign keys, DB constraints, normalized
data, the Interface is SQL

• NoSQL
• Store in schemaless format, redundancy encouraged, application access
determines the storage format (your queries).Interface varies and is
optimized for the implementation, no forced DB constraints. Tradeoff is
often you get eventual consistency.

NOSQL INTRO & CASSANDRA

7
TRADEOFFS?

Luckily, due to the large number of compromises made when
attempting to scale their existing relational databases,
these tradeoffs were not so foreign or
distasteful as they might have been.

Greg Burd - https://www.usenix.org/legacy/publications/login/201110/openpdfs/Burd.pdf

NOSQL INTRO & CASSANDRA

8
3 V’S – DESCRIBING THE BIG DATA PROBLEM
Driving force in requiring new technology is often referred to as the “3 V Model”.
•

High Volume – amount of data

•

High Variety – range of data types and sources

•

High Velocity – speed of data in and out

OK, maybe 4 V’s

•

Veracity – is all the data applicable to the problem being analyzed.

NOSQL INTRO & CASSANDRA

9
NOSQL IS NOT BIG DATA

NoSQL != Big Data
NoSQL products were created to help solve the big data problem.
Big data is a much larger problem than just storage. Analysis tools like
Hadoop, messaging systems like Kafka, real time processing engines like
Storm and machine learning (Mahout) all help solve the big data problem.

NOSQL INTRO & CASSANDRA

10
NOSQL TYPES
Wide Column– Column Family
• Cassandra, HBASE, Amazon SimpleDB
Key Value
• Riak, Redis, DynamoDB, Voldemort, MemcacheDB
Document DB
• MongoDB, CouchDB,
Graph
• Neo4J, OrientDB
Search (also alternatives, normally used with *)
• Lucene, Solr, ElasticSearch
Many many many, many more! (http://nosql-database.org/)

NOSQL INTRO & CASSANDRA

11
CHOOSING THE RIGHT ONE…
Choosing the right NoSQL type and eventual product depends on…
Type of Data
• One key and a lot of data?
• High volume of data?
• Storing, media, blobs,
• Document oriented?
• Tracking relationships?
• Combination?
• Multi-Datacenter
Type of Access
Volumes of Data (there is big data and there is BIG DATA)
Need Support/Services/Training

NOSQL INTRO & CASSANDRA

12
VISUAL GUIDE – USING THE CAP THEOREM
HTTP://BLOG.NAHURST.COM/VISUAL-GUIDE-TO-NOSQL-SYSTEMS

NOSQL INTRO & CASSANDRA

13
QUICK LOOK AT MONGO
Just so we can compare to Cassandra
•

Document Oriented

•

Storage format is JSON (actually BSON)

•

Replication built in

•

Master / slave architecture

•

Strong querying support

NOSQL INTRO & CASSANDRA

14
MONGO DOCUMENTS

NOSQL INTRO & CASSANDRA

15
DETOUR OVER…
LET’S TALK ABOUT CASSANDRA….
OR
C*

NOSQL INTRO & CASSANDRA

16
CASSANDRA HISTORY
•

Developed At Facebook, based on Google Big Table and Amazon Dynamo **

•

Open Sourced in mid 2008

•

Apache Project March 2009

•

Commercial Support through Datastax (originally known as Riptano, founded
2010)

•

Used at Netflix, eBay and many more. Reportedly 300 TB on 400 machines
largest installation

•

Current version is 2.0.1

NOSQL INTRO & CASSANDRA

17
WHY EVEN CONSIDER C*
•

Large data sets

•

Require high availability

•

Multi Data Center

•

Require large scaling

•

Write heavy applications

•

Can design for queries

•

Understand tunable consistency and implications (more to come)

•

Willing to make the effort upfront for the reward

NOSQL INTRO & CASSANDRA

18
SOME BASICS
•

ACID

•

CAP Theorem

•

BASE

NOSQL INTRO & CASSANDRA

19
ACID
YOU PROBABLY ALL HAVE HEARD OF ACID
•

Atomic – All or None

•

Consistency – What is written is valid

•

Isolation – One operation at a time

•

Durability – Once committed to the DB, it stays

This is the world we have lived in for a long time…

NOSQL INTRO & CASSANDRA

20
CAP THEOREM (BREWERS)
Many may have heard this one
CAP stands for Consistency, Availability and Partition Tolerance
• Consistency –like the C in ACID. Operation is all or nothing,
• Availability – service is available.
• Partition Tolerance – No failure other than complete network failure causes
system not to respond
(REMEMBER VISUAL GUIDE TO SELECTING A NO SQL DATABASE
So.. What does this mean?
** http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf

NOSQL INTRO & CASSANDRA

21
YOU CAN ONLY HAVE 2 OF THEM

Or better said in C* terms you can have Availability and Partition-Tolerant
AND Eventual Consistency.
Means eventually all accesses will return the last updated value.

NOSQL INTRO & CASSANDRA

22
BASE
But maybe you have not heard this one…
Somewhat contrived but gives CAP Theorem an acronym to use against ACID… Also
created by Eric Brewer.

Basically Available – system does guarantee availability, as much as possible.
Soft State – state may change even without input. Required because of eventual
consistency

Eventually Consistent – it will become consistent over time.
** Also, as engineers we cannot believe in anything that isn’t an acronym!

NOSQL INTRO & CASSANDRA

23
C* - WHAT PROBLEM IS BEING SOLVED?
•

Database for modern application requirements.

• Web Scale – massive amounts of data
• Scale Out – commodity hardware
• Flexible Schema (we will see this how this concept is evolving)
• Online admin (add to cluster, load balancing). Simpler operations
• CAP Theorem Aware
• Built based on
• Amazon Dynamo – Took partition and replication from here **
• Google Bigtable – log structured column family from here ***
** http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
*** http://research.google.com/archive/bigtable.html

NOSQL INTRO & CASSANDRA

24
C* BASICS
•

No Single Point of Failure – highly available.
• Peer to Peer – no master

•

Data Center Aware – distributed architecture

•

Linear Scaling – just add hardware

•

Eventual Consistency, tunable tradeoff between latency and consistency

•

Architecture is optimized for writes.

•

Can have 2 billion columns!

•

Data modeling for reads. Design starts with looking at your queries.

•

With CQL became more SQL-Like, but no joins, no subqueries, limited ordering (but very useful)

•

Column Names can part of data, e.g. Time Series

Don’t be afraid of denormalized and redundant data for read performance.
In fact embrace it! Remember, writes are fast.

NOSQL INTRO & CASSANDRA

25
NOTE ABOUT EVENTUAL CONSISTENCY
** Important Term **
Quorum : Q = N / 2 + 1.
We get consistency in a BASE world by satisfying W + R > N

3 obvious ways:
1.W = 1, R = N
2.W = N, R = 1

3.W = Q, R = Q
(N is replication factor, R = read replica count, W = write replica count)

NOSQL INTRO & CASSANDRA

26
THE C* DATA MODEL
C* data model is made of these:
Column – a name, a value and a timestamp. Applications can use the name as
the data and not use value. (RDBMS like a column).
Row – a collection of columns identified by a unique key. Key is called a partition
key (RDBMS like a row).
Column Family – container for an ordered collection rows. Each row is an
ordered collection of columns. Each column has a key and maybe a value.
(RDBMS like a table).
This is also known as a table now in C* terms.
Keyspace – administrative container for CF’s. It is a namespace. Also has a
replication strategy – more late. (RDBMS like a DB or schema).
Super Column Family – say what?

NOSQL INTRO & CASSANDRA

27
SUPER COLUMN FAMILY
Not recommended, but they exist. Rarely discussed
It is a key, that contains to one or more nested row keys and then these each
contain a collection of columns.
Can think of it as a hash table of hash tables that contain columns..

NOSQL INTRO & CASSANDRA

28
ARCHITECTURE (CONT.)

http://www.slideshare.net/gdusbabek/data-modeling-with-cassandra-columnfamilies

NOSQL INTRO & CASSANDRA

29
OR CAN ALSO BE VIEWED AS…

http://www.slideshare.net/gdusbabek/data-modeling-withcassandra-column-families

NOSQL INTRO & CASSANDRA

30
TOKENS
Tokens – partitioner dependent element on the ring.
Each node has a single unique token assigned.
Each node claims a range of tokens that is from its token to token of the previous node on the
ring.

Use this formula
Initial_Token= Zero_Indexed_Node_Number * ((2^127) / Number_Of_Nodes)
In cassandra.yaml
initial token=42535295865117307932921825928971026432
** http://blog.milford.io/cassandra-token-calculator/

NOSQL INTRO & CASSANDRA

31
C* PARTITIONER
RandomPartitioner – MD5 hash of key is token (128 bit number), gives you
even distribution in cluster. Default <= version 1.1

OrderPreservingPartitioner – tokens are UTF-8 strings. Uneven distribution.
Murmur3Partitioner – same functionally as RandomPartitioner, but is 3 – 5
times faster. Uses Murmur3 algorithm. Default >= 1.2

Set in cassandra.yaml
partitioner: org.apache.cassandra.dht.Murmur3Partitioner

NOSQL INTRO & CASSANDRA

32
REPLICATION
•

Replication is how many copies of each piece of data that should be stored.
In C* terms it is Replication Factor or “RF”.

•

In C* RF is set at the keyspace level:
CREATE KEYSPACE drg_compare WITH replication =
{'class':'SimpleStrategy', 'replication_factor':3};

•

How the data is replicated is called the Replication Strategy
• SimpleStrategy – returns nodes “next” to each other on ring, Assumes
single DC
• NetworkTopologyStrategy – for configuring per data center. Rack and
DC’s aware.
update keyspace UserProfile with strategy_options=[{DC1:3, DC2:3}];

NOSQL INTRO & CASSANDRA

33
SNITCH
•

Snitch maps IP’s to racks and data centers.

•

Several kinds that are configured in cassandra.yaml. Must be same across the
cluster.
SimpleSnitch - does not recognize data center or rack information. Use it for
single-data center deployments (or single-zone in public clouds)
PropertyFileSnitch - This snitch uses a user-defined description of the network
details located in the property file cassandra-topology.properties. Use this snitch
when your node IPs are not uniform or if you have complex replication grouping
requirements.
RackInferringSnitch - The RackInferringSnitch infers (assumes) the topology of
the network by the octet of the node's IP address.
EC2* - EC2Snitch, EC2MultiRegionSnitch

NOSQL INTRO & CASSANDRA

34
RING TOPOLOGY
When thinking of Cassandra best to think of nodes as part of ring topology, even
for multiple DC.

NOSQL INTRO & CASSANDRA

35
SimpleStrategy
Using token generation values from before. 4 node cluster. Write value with
token 32535295865117307932921825928971026432

NOSQL INTRO & CASSANDRA

36
SimpleStrategy #2

NOSQL INTRO & CASSANDRA

37
SimpleStrategy #3
With RF of 3 replication works like this:

NOSQL INTRO & CASSANDRA

38
NetworkTopologyStrategy
Using LOCAL_QUORUM, allows write to DC #2 to be asynchronous. Marked as
success when writes to 2 of 3 nodes (http://www.datastax.com/dev/blog/deploying-cassandra-across-multipledata-centers)

NOSQL INTRO & CASSANDRA

39
COORDINATOR & CL
•
•

•

When writing, Coordinator Node will be selected. Selected at write (or read) time.
Not a SPF!
Using Gossip Protocol nodes share information with each other. Who is up, who
is down, who is taking which token ranges, etc. Every second, each node shares
with 1 to 3 nodes.
Consistency Level (CL) – says how many nodes must agree before an operation
is a success. Set at read or write operation.
• ONE – coordinator will wait for one node to ack write (also TWO, THREE). One is
default if none provided.
• QUORUM – we saw that before. N / 2 + 1. LOCAL_QUORUM, EACH_QUORUM
• ANY – waits for some replicate. If all down, still succeeds. Only for writes. Doesn’t
guarantee it can be read.
• ALL– Blocks waiting for all replicas

NOSQL INTRO & CASSANDRA

40
ENSURING CONSISTENCY
3 important concepts:
Read Repair - At time of read, inconsistencies are noticed between nodes and
replicas are updated. Direct and background. Direct is determined by CL.
Anti-Entropy Node Repair - For data that is not read frequently, or to update
data on a node that has been down for a while, the nodetool repair process
(also called anti-entropy repair). Builds Merkle trees, compares nodes and
does repair.
Hinted Handoff - Writes are always sent to all replicas for the specified row
regardless of the consistency level specified by the client. If a node happens
to be down at the time of write, its corresponding replicas will save hints
about the missed writes, and then handoff the affected rows once the node
comes back online. This notification happens is via Gossip. Default 1 hour.

NOSQL INTRO & CASSANDRA

41
SUMMARY
C* Provider highly available, distributed, DC aware DB with tuneable consistency out of the
box.
A lot of tools at your disposal.
Work close with ops or devops .
Test, test and test again.

Don’t be afraid to use the C* community.
Brian Enochson
brian.enochson@gmail.com
Available for training, consulting, architecture & development.

Thank you!

NOSQL INTRO & CASSANDRA

42

Contenu connexe

Tendances

Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScyllaDB
 
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...ScyllaDB
 
How to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityHow to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityScyllaDB
 
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File FormatScylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File FormatScyllaDB
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBComparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBScyllaDB
 
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...ScyllaDB
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
 
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIScyllaDB
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
 
Addressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache CassandraAddressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache CassandraScyllaDB
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)Julia Angell
 
NewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPNewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPDATAVERSITY
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphScyllaDB
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed
 
NewSQL - The Future of Databases?
NewSQL - The Future of Databases?NewSQL - The Future of Databases?
NewSQL - The Future of Databases?Elvis Saravia
 

Tendances (20)

Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
 
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
Numberly on Joining Billions of Rows in Seconds: Replacing MongoDB and Hive w...
 
How to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityHow to achieve no compromise performance and availability
How to achieve no compromise performance and availability
 
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File FormatScylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBComparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
 
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
 
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Addressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache CassandraAddressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache Cassandra
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)
 
NewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPNewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTP
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
 
NewSQL
NewSQLNewSQL
NewSQL
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
 
NewSQL - The Future of Databases?
NewSQL - The Future of Databases?NewSQL - The Future of Databases?
NewSQL - The Future of Databases?
 

En vedette

Mobile learning in formal education or: How to train a trojan horse
Mobile learning in formal education or: How to train a trojan horseMobile learning in formal education or: How to train a trojan horse
Mobile learning in formal education or: How to train a trojan horseBenjamin Jörissen
 
IDF2013大会分享——《使用新浪移动云开发全平台应用》
IDF2013大会分享——《使用新浪移动云开发全平台应用》IDF2013大会分享——《使用新浪移动云开发全平台应用》
IDF2013大会分享——《使用新浪移动云开发全平台应用》easychen
 
PHP and the Cloud (phpbenelux conference)
PHP and the Cloud (phpbenelux conference)PHP and the Cloud (phpbenelux conference)
PHP and the Cloud (phpbenelux conference)Ivo Jansch
 
为什么你需要了解应用云
为什么你需要了解应用云为什么你需要了解应用云
为什么你需要了解应用云easychen
 
Eva Todd: A wedding in Hluboka nad Vltavou, Czech Republic
Eva Todd: A wedding in Hluboka nad Vltavou, Czech RepublicEva Todd: A wedding in Hluboka nad Vltavou, Czech Republic
Eva Todd: A wedding in Hluboka nad Vltavou, Czech Republicvinion
 
5gaia Publizitate eta HHPP Sarrera
5gaia Publizitate eta HHPP Sarrera5gaia Publizitate eta HHPP Sarrera
5gaia Publizitate eta HHPP Sarrerakatixa
 
Historia3.3 3
Historia3.3 3Historia3.3 3
Historia3.3 3katixa
 
Own Your Apps
Own Your Apps Own Your Apps
Own Your Apps Ivo Jansch
 
Wiki Presentation 01
Wiki Presentation 01Wiki Presentation 01
Wiki Presentation 01rwakefor
 
Digital lessons from Haiti - #DigiFun2010
Digital lessons from Haiti - #DigiFun2010Digital lessons from Haiti - #DigiFun2010
Digital lessons from Haiti - #DigiFun2010Jonathan Waddingham
 
Extensionen, Kontagionen und die Grenzen von Bildungsprozessen
Extensionen, Kontagionen und die Grenzen von BildungsprozessenExtensionen, Kontagionen und die Grenzen von Bildungsprozessen
Extensionen, Kontagionen und die Grenzen von BildungsprozessenBenjamin Jörissen
 
J2Me Il Micro Mondo Java
J2Me Il Micro Mondo JavaJ2Me Il Micro Mondo Java
J2Me Il Micro Mondo JavaAntonio Terreno
 
Using blogging to build supporter engagement
Using blogging to build supporter engagementUsing blogging to build supporter engagement
Using blogging to build supporter engagementJonathan Waddingham
 
Digital thinking
Digital thinkingDigital thinking
Digital thinkingTony Ryan
 

En vedette (20)

4 piliere mojho duchovneho rastu
4 piliere mojho duchovneho rastu4 piliere mojho duchovneho rastu
4 piliere mojho duchovneho rastu
 
Mobile learning in formal education or: How to train a trojan horse
Mobile learning in formal education or: How to train a trojan horseMobile learning in formal education or: How to train a trojan horse
Mobile learning in formal education or: How to train a trojan horse
 
IDF2013大会分享——《使用新浪移动云开发全平台应用》
IDF2013大会分享——《使用新浪移动云开发全平台应用》IDF2013大会分享——《使用新浪移动云开发全平台应用》
IDF2013大会分享——《使用新浪移动云开发全平台应用》
 
PHP and the Cloud (phpbenelux conference)
PHP and the Cloud (phpbenelux conference)PHP and the Cloud (phpbenelux conference)
PHP and the Cloud (phpbenelux conference)
 
为什么你需要了解应用云
为什么你需要了解应用云为什么你需要了解应用云
为什么你需要了解应用云
 
Kazen evanjelizacia&ucenictvo-16.02.2014
Kazen evanjelizacia&ucenictvo-16.02.2014Kazen evanjelizacia&ucenictvo-16.02.2014
Kazen evanjelizacia&ucenictvo-16.02.2014
 
H1B 2017 Predictions: Will There Be A H-1B Lottery Again?
H1B 2017 Predictions: Will There Be A H-1B Lottery Again?H1B 2017 Predictions: Will There Be A H-1B Lottery Again?
H1B 2017 Predictions: Will There Be A H-1B Lottery Again?
 
Eva Todd: A wedding in Hluboka nad Vltavou, Czech Republic
Eva Todd: A wedding in Hluboka nad Vltavou, Czech RepublicEva Todd: A wedding in Hluboka nad Vltavou, Czech Republic
Eva Todd: A wedding in Hluboka nad Vltavou, Czech Republic
 
5gaia Publizitate eta HHPP Sarrera
5gaia Publizitate eta HHPP Sarrera5gaia Publizitate eta HHPP Sarrera
5gaia Publizitate eta HHPP Sarrera
 
My Name ...
My Name ...My Name ...
My Name ...
 
Jur piesen-ako-modlitba-2011
Jur  piesen-ako-modlitba-2011Jur  piesen-ako-modlitba-2011
Jur piesen-ako-modlitba-2011
 
Historia3.3 3
Historia3.3 3Historia3.3 3
Historia3.3 3
 
Own Your Apps
Own Your Apps Own Your Apps
Own Your Apps
 
Stop To Think
Stop To ThinkStop To Think
Stop To Think
 
Wiki Presentation 01
Wiki Presentation 01Wiki Presentation 01
Wiki Presentation 01
 
Digital lessons from Haiti - #DigiFun2010
Digital lessons from Haiti - #DigiFun2010Digital lessons from Haiti - #DigiFun2010
Digital lessons from Haiti - #DigiFun2010
 
Extensionen, Kontagionen und die Grenzen von Bildungsprozessen
Extensionen, Kontagionen und die Grenzen von BildungsprozessenExtensionen, Kontagionen und die Grenzen von Bildungsprozessen
Extensionen, Kontagionen und die Grenzen von Bildungsprozessen
 
J2Me Il Micro Mondo Java
J2Me Il Micro Mondo JavaJ2Me Il Micro Mondo Java
J2Me Il Micro Mondo Java
 
Using blogging to build supporter engagement
Using blogging to build supporter engagementUsing blogging to build supporter engagement
Using blogging to build supporter engagement
 
Digital thinking
Digital thinkingDigital thinking
Digital thinking
 

Similaire à NoSQL Intro with cassandra

NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBrian Enochson
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQLCrate.io
 
DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsOleg Magazov
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinAmazon Web Services
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...NETWAYS
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQLUlf Wendel
 

Similaire à NoSQL Intro with cassandra (20)

NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
 
NoSQL
NoSQLNoSQL
NoSQL
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit Dublin
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQL
 

Dernier

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Dernier (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

NoSQL Intro with cassandra

  • 1. NOSQL INTRO & CASSANDRA 1
  • 2. REQUISITE SLIDE – WHO AM I? - Brian Enochson - Home is the Jersey Shore - SW Engineer who has worked as designer / developer on NOSQL (Mongo, Cassandra) - Consultant – HBO, ACS, CIBER - Specialize in SW Development, architecture and training Brian Enochson brian.enochson@gmail.com Available for training, consulting, architecture & development. NOSQL INTRO & CASSANDRA 2
  • 3. REQUISITE SLIDE # 2 – WHAT ARE WE TALKING ABOUT? • • • • • • NoSQL Introduction What brought us here Types of NoSQL Products What about Hadoop? What about Real-Time? Quick look at MongoDB • • • • Cassandra Intro & Architecture Why Cassandra Architecture Internals Development • • • • • • Data Modeling Concepts Old vs. New Way Basics Composite Types Collections Time Series Data Counters • • NOSQL INTRO & CASSANDRA 3
  • 4. HISTORY OF THE DATABASE • 1960’s – Hierarchical and Network type (IMS and CODASYL) • 1970’s – Beginnings of theory behind relational model. Codd • 1980’s – Rise of the relational model. SQL. E/R Model (Chen) • 1990’s – Access/Excel and MySQL. ODMS began to appear • 2000;’s – Two forces; large enterprise and open source. Google and Amazon. CAP Theorem (more on that to come…) • 2010’s – Immergence of NoSQL as an industry player and viable alternative NOSQL INTRO & CASSANDRA 4
  • 5. WHY WERE ALTERNATIVES NEEDED • Developers today are faced with Internet scale • 100,000’s of users • Low cost of storage • Increased processing power • Ability to capture (and need) of millions of events. Caching solves it to an extent but brings other complexities • Real-time • Need to scale out and not up. (add infinite number of low cost machines, vs. add a more powerful machine). • Cost • Let’s not forget for enterprise DB’s Internet scale can become expensive • Open source DB’s may solve license cost, but don’t ignore operational costs NOSQL INTRO & CASSANDRA 5
  • 6. A LOT OF DATA Some facts from http://www.storagenewsletter.com/rubriques/marketreportsresearch/ibm-cmo-study/ Approximately 90 percent of all the real-time information being created today is unstructured data Every day we create 2.5 quintillion (10 to the 18th) bytes of data (this is 30 zeroes!!) 90 percent of the world's data today has been created in the last two years alone NOSQL INTRO & CASSANDRA 6
  • 7. RELATIONAL VS. NOSQL • Relational • Divide into tables, relate into foreign keys, DB constraints, normalized data, the Interface is SQL • NoSQL • Store in schemaless format, redundancy encouraged, application access determines the storage format (your queries).Interface varies and is optimized for the implementation, no forced DB constraints. Tradeoff is often you get eventual consistency. NOSQL INTRO & CASSANDRA 7
  • 8. TRADEOFFS? Luckily, due to the large number of compromises made when attempting to scale their existing relational databases, these tradeoffs were not so foreign or distasteful as they might have been. Greg Burd - https://www.usenix.org/legacy/publications/login/201110/openpdfs/Burd.pdf NOSQL INTRO & CASSANDRA 8
  • 9. 3 V’S – DESCRIBING THE BIG DATA PROBLEM Driving force in requiring new technology is often referred to as the “3 V Model”. • High Volume – amount of data • High Variety – range of data types and sources • High Velocity – speed of data in and out OK, maybe 4 V’s • Veracity – is all the data applicable to the problem being analyzed. NOSQL INTRO & CASSANDRA 9
  • 10. NOSQL IS NOT BIG DATA NoSQL != Big Data NoSQL products were created to help solve the big data problem. Big data is a much larger problem than just storage. Analysis tools like Hadoop, messaging systems like Kafka, real time processing engines like Storm and machine learning (Mahout) all help solve the big data problem. NOSQL INTRO & CASSANDRA 10
  • 11. NOSQL TYPES Wide Column– Column Family • Cassandra, HBASE, Amazon SimpleDB Key Value • Riak, Redis, DynamoDB, Voldemort, MemcacheDB Document DB • MongoDB, CouchDB, Graph • Neo4J, OrientDB Search (also alternatives, normally used with *) • Lucene, Solr, ElasticSearch Many many many, many more! (http://nosql-database.org/) NOSQL INTRO & CASSANDRA 11
  • 12. CHOOSING THE RIGHT ONE… Choosing the right NoSQL type and eventual product depends on… Type of Data • One key and a lot of data? • High volume of data? • Storing, media, blobs, • Document oriented? • Tracking relationships? • Combination? • Multi-Datacenter Type of Access Volumes of Data (there is big data and there is BIG DATA) Need Support/Services/Training NOSQL INTRO & CASSANDRA 12
  • 13. VISUAL GUIDE – USING THE CAP THEOREM HTTP://BLOG.NAHURST.COM/VISUAL-GUIDE-TO-NOSQL-SYSTEMS NOSQL INTRO & CASSANDRA 13
  • 14. QUICK LOOK AT MONGO Just so we can compare to Cassandra • Document Oriented • Storage format is JSON (actually BSON) • Replication built in • Master / slave architecture • Strong querying support NOSQL INTRO & CASSANDRA 14
  • 15. MONGO DOCUMENTS NOSQL INTRO & CASSANDRA 15
  • 16. DETOUR OVER… LET’S TALK ABOUT CASSANDRA…. OR C* NOSQL INTRO & CASSANDRA 16
  • 17. CASSANDRA HISTORY • Developed At Facebook, based on Google Big Table and Amazon Dynamo ** • Open Sourced in mid 2008 • Apache Project March 2009 • Commercial Support through Datastax (originally known as Riptano, founded 2010) • Used at Netflix, eBay and many more. Reportedly 300 TB on 400 machines largest installation • Current version is 2.0.1 NOSQL INTRO & CASSANDRA 17
  • 18. WHY EVEN CONSIDER C* • Large data sets • Require high availability • Multi Data Center • Require large scaling • Write heavy applications • Can design for queries • Understand tunable consistency and implications (more to come) • Willing to make the effort upfront for the reward NOSQL INTRO & CASSANDRA 18
  • 20. ACID YOU PROBABLY ALL HAVE HEARD OF ACID • Atomic – All or None • Consistency – What is written is valid • Isolation – One operation at a time • Durability – Once committed to the DB, it stays This is the world we have lived in for a long time… NOSQL INTRO & CASSANDRA 20
  • 21. CAP THEOREM (BREWERS) Many may have heard this one CAP stands for Consistency, Availability and Partition Tolerance • Consistency –like the C in ACID. Operation is all or nothing, • Availability – service is available. • Partition Tolerance – No failure other than complete network failure causes system not to respond (REMEMBER VISUAL GUIDE TO SELECTING A NO SQL DATABASE So.. What does this mean? ** http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf NOSQL INTRO & CASSANDRA 21
  • 22. YOU CAN ONLY HAVE 2 OF THEM Or better said in C* terms you can have Availability and Partition-Tolerant AND Eventual Consistency. Means eventually all accesses will return the last updated value. NOSQL INTRO & CASSANDRA 22
  • 23. BASE But maybe you have not heard this one… Somewhat contrived but gives CAP Theorem an acronym to use against ACID… Also created by Eric Brewer. Basically Available – system does guarantee availability, as much as possible. Soft State – state may change even without input. Required because of eventual consistency Eventually Consistent – it will become consistent over time. ** Also, as engineers we cannot believe in anything that isn’t an acronym! NOSQL INTRO & CASSANDRA 23
  • 24. C* - WHAT PROBLEM IS BEING SOLVED? • Database for modern application requirements. • Web Scale – massive amounts of data • Scale Out – commodity hardware • Flexible Schema (we will see this how this concept is evolving) • Online admin (add to cluster, load balancing). Simpler operations • CAP Theorem Aware • Built based on • Amazon Dynamo – Took partition and replication from here ** • Google Bigtable – log structured column family from here *** ** http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html *** http://research.google.com/archive/bigtable.html NOSQL INTRO & CASSANDRA 24
  • 25. C* BASICS • No Single Point of Failure – highly available. • Peer to Peer – no master • Data Center Aware – distributed architecture • Linear Scaling – just add hardware • Eventual Consistency, tunable tradeoff between latency and consistency • Architecture is optimized for writes. • Can have 2 billion columns! • Data modeling for reads. Design starts with looking at your queries. • With CQL became more SQL-Like, but no joins, no subqueries, limited ordering (but very useful) • Column Names can part of data, e.g. Time Series Don’t be afraid of denormalized and redundant data for read performance. In fact embrace it! Remember, writes are fast. NOSQL INTRO & CASSANDRA 25
  • 26. NOTE ABOUT EVENTUAL CONSISTENCY ** Important Term ** Quorum : Q = N / 2 + 1. We get consistency in a BASE world by satisfying W + R > N 3 obvious ways: 1.W = 1, R = N 2.W = N, R = 1 3.W = Q, R = Q (N is replication factor, R = read replica count, W = write replica count) NOSQL INTRO & CASSANDRA 26
  • 27. THE C* DATA MODEL C* data model is made of these: Column – a name, a value and a timestamp. Applications can use the name as the data and not use value. (RDBMS like a column). Row – a collection of columns identified by a unique key. Key is called a partition key (RDBMS like a row). Column Family – container for an ordered collection rows. Each row is an ordered collection of columns. Each column has a key and maybe a value. (RDBMS like a table). This is also known as a table now in C* terms. Keyspace – administrative container for CF’s. It is a namespace. Also has a replication strategy – more late. (RDBMS like a DB or schema). Super Column Family – say what? NOSQL INTRO & CASSANDRA 27
  • 28. SUPER COLUMN FAMILY Not recommended, but they exist. Rarely discussed It is a key, that contains to one or more nested row keys and then these each contain a collection of columns. Can think of it as a hash table of hash tables that contain columns.. NOSQL INTRO & CASSANDRA 28
  • 30. OR CAN ALSO BE VIEWED AS… http://www.slideshare.net/gdusbabek/data-modeling-withcassandra-column-families NOSQL INTRO & CASSANDRA 30
  • 31. TOKENS Tokens – partitioner dependent element on the ring. Each node has a single unique token assigned. Each node claims a range of tokens that is from its token to token of the previous node on the ring. Use this formula Initial_Token= Zero_Indexed_Node_Number * ((2^127) / Number_Of_Nodes) In cassandra.yaml initial token=42535295865117307932921825928971026432 ** http://blog.milford.io/cassandra-token-calculator/ NOSQL INTRO & CASSANDRA 31
  • 32. C* PARTITIONER RandomPartitioner – MD5 hash of key is token (128 bit number), gives you even distribution in cluster. Default <= version 1.1 OrderPreservingPartitioner – tokens are UTF-8 strings. Uneven distribution. Murmur3Partitioner – same functionally as RandomPartitioner, but is 3 – 5 times faster. Uses Murmur3 algorithm. Default >= 1.2 Set in cassandra.yaml partitioner: org.apache.cassandra.dht.Murmur3Partitioner NOSQL INTRO & CASSANDRA 32
  • 33. REPLICATION • Replication is how many copies of each piece of data that should be stored. In C* terms it is Replication Factor or “RF”. • In C* RF is set at the keyspace level: CREATE KEYSPACE drg_compare WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; • How the data is replicated is called the Replication Strategy • SimpleStrategy – returns nodes “next” to each other on ring, Assumes single DC • NetworkTopologyStrategy – for configuring per data center. Rack and DC’s aware. update keyspace UserProfile with strategy_options=[{DC1:3, DC2:3}]; NOSQL INTRO & CASSANDRA 33
  • 34. SNITCH • Snitch maps IP’s to racks and data centers. • Several kinds that are configured in cassandra.yaml. Must be same across the cluster. SimpleSnitch - does not recognize data center or rack information. Use it for single-data center deployments (or single-zone in public clouds) PropertyFileSnitch - This snitch uses a user-defined description of the network details located in the property file cassandra-topology.properties. Use this snitch when your node IPs are not uniform or if you have complex replication grouping requirements. RackInferringSnitch - The RackInferringSnitch infers (assumes) the topology of the network by the octet of the node's IP address. EC2* - EC2Snitch, EC2MultiRegionSnitch NOSQL INTRO & CASSANDRA 34
  • 35. RING TOPOLOGY When thinking of Cassandra best to think of nodes as part of ring topology, even for multiple DC. NOSQL INTRO & CASSANDRA 35
  • 36. SimpleStrategy Using token generation values from before. 4 node cluster. Write value with token 32535295865117307932921825928971026432 NOSQL INTRO & CASSANDRA 36
  • 38. SimpleStrategy #3 With RF of 3 replication works like this: NOSQL INTRO & CASSANDRA 38
  • 39. NetworkTopologyStrategy Using LOCAL_QUORUM, allows write to DC #2 to be asynchronous. Marked as success when writes to 2 of 3 nodes (http://www.datastax.com/dev/blog/deploying-cassandra-across-multipledata-centers) NOSQL INTRO & CASSANDRA 39
  • 40. COORDINATOR & CL • • • When writing, Coordinator Node will be selected. Selected at write (or read) time. Not a SPF! Using Gossip Protocol nodes share information with each other. Who is up, who is down, who is taking which token ranges, etc. Every second, each node shares with 1 to 3 nodes. Consistency Level (CL) – says how many nodes must agree before an operation is a success. Set at read or write operation. • ONE – coordinator will wait for one node to ack write (also TWO, THREE). One is default if none provided. • QUORUM – we saw that before. N / 2 + 1. LOCAL_QUORUM, EACH_QUORUM • ANY – waits for some replicate. If all down, still succeeds. Only for writes. Doesn’t guarantee it can be read. • ALL– Blocks waiting for all replicas NOSQL INTRO & CASSANDRA 40
  • 41. ENSURING CONSISTENCY 3 important concepts: Read Repair - At time of read, inconsistencies are noticed between nodes and replicas are updated. Direct and background. Direct is determined by CL. Anti-Entropy Node Repair - For data that is not read frequently, or to update data on a node that has been down for a while, the nodetool repair process (also called anti-entropy repair). Builds Merkle trees, compares nodes and does repair. Hinted Handoff - Writes are always sent to all replicas for the specified row regardless of the consistency level specified by the client. If a node happens to be down at the time of write, its corresponding replicas will save hints about the missed writes, and then handoff the affected rows once the node comes back online. This notification happens is via Gossip. Default 1 hour. NOSQL INTRO & CASSANDRA 41
  • 42. SUMMARY C* Provider highly available, distributed, DC aware DB with tuneable consistency out of the box. A lot of tools at your disposal. Work close with ops or devops . Test, test and test again. Don’t be afraid to use the C* community. Brian Enochson brian.enochson@gmail.com Available for training, consulting, architecture & development. Thank you! NOSQL INTRO & CASSANDRA 42