The No SQL Principles and Basic Application Of Casandra Model

TThhee NNooSSQQLL PPrriinncciipplleess
&&
BBaassiicc AApppplliiccaattiioonn ooff
CCaassssaannddrraa MMooddeell
RReesshhmmii RRaaddhhaakkrriisshhnnaann
SS77 CCSS BB
RRoollll NNOO::7711
GGuuiiddeedd bbyy,,
DDrr..SSuuddhheeeepp EEllaayyiiddoomm
SSOOEE CCUUSSAATT

AAGGEENNDDAA
• INTRODUCTION
• RDBMS
• CHALLENGE
• NoSQL
• COMMON CONCEPTS
• CLASSIFICATION
• CASSANDRA
• CASSANDRA CHARACTERISTICS
• CONCLUSION
• REFERENCES

INTRODUCTION
* RDBMS is the predominant technology for storing structured
data in web and business applications.
* The relational database systems have little capability to
horizontally scale.
* NoSQL approach includes simplicity of design, horizontal
scaling and finer control over availability
* Cassandra is the right NoSQL database when you need
scalability and high availability without compromising
performance

RDBMS
MERITS:
• Rich language
• Easy to use and integrate
• Rich toolset
• Vertical scaling
• The promise: ACID
o Atomicity
o Consistency
o Isolation
o Durability

RDBMS
DEMERTS:
• Vertical scaling is highly expensive
• Fails to handle large amount of data
• It is not much efficient in cloud concept
The read-write rates of data in RDBMS are very poor
So it is hard to face the challenges from the modern web
applications using RDBMS

The Challenge: Modern web apps
• Internet-scale data size
• High read-write rates
• Frequent schema changes
• "social" apps - not banks
o They don't need the same
level of ACID
SCALING

NoSQL
• Uses horizontal scaling
• Distribute data over many servers
• It give up ACID property
• Based on CAP -theorem

Brewer's CAP Theorem:
You can only choose two

CAP
Consistency:
A distributed system is considered to be consistent if after an update
operation of some writer, all readers see his updates in some
shared data sources
Availability:
System is designed in a way that continue operation even if nodes in
a cluster crash
Partition Tolerance:
Ability of a system to continue operation in the presence of network
partition

COMMON CONCEPTS
•Sharding
•Consistent hashing
•Map reduce

SHARDING
• it's a partitioning mechanism
• records are stored in different servers according to some
key
• records that are accesses/updated together reside on same
node
• load is almost evenly distributed among servers
• vertical partitioning: parts of single records are stored on
different servers

Constant Hashing
• A,B,C:- NODES
• 1,2,3,4:- OBJECTS
-> both are placed in ring
->movement is clockwise
->nodes can leave the system
->nodes can enter into the system

MAP REDUCE
•Used in
distributed
computig
• map function
• reduce function
• process on
key/value

Classification of NOSQL data stores
• Document Oriented
o CouchDB, MongoDB, Lotus Notes, SimpleDB
• Key-Value oriented
o Voldemort, Dynamo, Riak (sort of), Redis, Tokyo
• Column oriented
o Cassandra, HBase, BigTable
• Graph Databases oriented
o Neo4J, FlockDB, DEX, AlegroGraph

• Developed at facebook
• Follows the BigTable Data Model - column
oriented
• Follows the Dynamo Eventual Consistency
model
• Opensourced at Apache
• Implemented in Java

• Distributed Storage System
• Manages structured data and scale to large
size

Data Model
• Instance of Cassandra – Consists of one table
represent multidimensional map indexed by a key
• Query for information:
get(keyspace, column family, row key)
• Keyspace:- top level unit of information, Column
families are subordinate of one key space
• Column :- atomic unit of information, expressed
as: name:value
• Super Column :- groups together like columns
with a common name,
Used for modeling complex data types (address)

Data Model
ROW FAMILY
• uniquely identifiable data
• groups column and super column
• Every row are identified by row key
COLUMN FAMILY
• It have to be define in advance before a cluster of servers in
Cassandra instance is launched
• It consists of keyed rows which groups columns and super
columns
• Column and super column added dynamically to column
families and they are not restricted in numbers

MEMTABLES
• In-memory representation of recently written data
• When the table is full, it's sorted and then flushed to disk -> sstable
SS TABLES
Sorted Strings Tables
• Unchangeable
• On-disk
• Sorted by a string key
• In-memory index of elements
• Binary search (in memory) to find element location
• Bloom filter to reduce number of unneeded binary searches.
WRITE PROPERTIES
• No Locks in the critical path
• Always available to writes, even if there are failures.
No seeks
• Fast
• Atomic within a Row

Read Properteis
• Read multiple SSTables
• Slower than writes (but still fast)
• Seeks can be mitigated with more RAM
• Uses probabilistic bloom filters to reduce lookups.
• Extensive optional caching
o Key Cache
o Row Cache

Bloom Filters
• Space efficient probabilistic data structure
• Test whether an element is a member of a set
• Union and intersection are implemented as bitwise OR, AND

QUERIES EXAMPLES
CREATE INDEX CREATE CUSTOM INDEX IF NOT
EXISTS index_name
ON keyspace_name.table_name ( KEYS (column_name) )
( USING class_name ) ( WITH OPTIONS = map )
Restrictions:
USING class_name is allowed only if CUSTOM is used
and
class_name is a string literal containing a java class name.
index_name is an identifier, enclosed or not enclosed in
double
quotation marks, excluding reserved words.
map is described in ALTER KEYSPACE.
CREATE KEYSPACE CREATE ( KEYSPACE |
SCHEMA ) IF NOT EXISTS keyspace_name
WITH REPLICATION = map
AND DURABLE_WRITES = ( true | false )

MySQL Comparison
• MySQL : for 50 GB Data
Writes Average : ~300 ms
Reads Average : ~350 ms
• Cassandra: for 50 GB Data
Writes Average : 0.12 ms
Reads Average : 15 ms

CONCLUSION
•NoSQL is highly efficient concept for dealing
large amount of data.
•It can be used to solve big data problem.
•Cassandra model can provide fast reading and
writing operations
•So this database model is used by all the latest
social networking medias

REFERENCES
**The NoSQL Principles and Basic Application of Cassandra Model
Guoxi Wang ; Jianfeng Tang
Computer Science & Service System (CSSS), 2012 International Conference on
Digital Object Identifier: 10.1109/CSSS.2012.336
Publication Year: 2012 , Page(s): 1332 - 1335
IEEE CONFERENCE PUBLICATIONS
**Survey on NoSQL database
Jing Han ; Haihong, E. ; Guan Le ; Jian Du
Pervasive Computing and Applications (ICPCA), 2011 6th InternationalConferenceon
Digital Object Identifier: 10.1109/ICPCA.2011.6106531
Publication Year: 2011 , Page(s): 363 - 366
Cited by: Papers (2)
IEEE CONFERENCE PUBLICATIONS

The No SQL Principles and Basic Application Of Casandra Model

The No SQL Principles and Basic Application Of Casandra Model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to The No SQL Principles and Basic Application Of Casandra Model

Similar to The No SQL Principles and Basic Application Of Casandra Model (20)

More from Rishikese MR

More from Rishikese MR (19)

Recently uploaded

Recently uploaded (20)

The No SQL Principles and Basic Application Of Casandra Model