SlideShare a Scribd company logo
1 of 79
Download to read offline
Big Data and NoSQL with MongoDB &
Cassandra

NOSQL Intro with MongoDB and Cassandra

1
-

Brian Enochson
- SW Engineer who has worked as designer / developer

on NOSQL (Mongo, Cassandra, Hadoop)
- Specialize in SW Development, architecture and
training





Brian Enochson
brian.enochson@gmail.com
Twitter @benochso
Google Plus
https://plus.google.com/+BrianEnochson
NOSQL Intro with MongoDB and
Cassandra

2
•
•
•
•

•

Presentation Intro
Introduction to Big Data
Introduction to NoSQL
Relational Database to NoSQL technology
contrast & compare
NoSQL landscape

NOSQL Intro with MongoDB and
Cassandra

3
•
•

•
•

•
•
•

Introduction to MongoDB
MongoDB Components, capabilities and
common use cases
Json & BsON
Documents, collections, references and Mongo
ID
Querying
Data Modeling/Schema Design
Replication & Sharding
NOSQL Intro with MongoDB and
Cassandra

4
•
•
•
•
•
•

Cassandra
Architecture
Data Model
Data Modeling
Application Development
Wrap-up and final Q & A

NOSQL Intro with MongoDB and
Cassandra

5




http://www.cloudtweaks.com/2014/01/hand-writing-data-data-everywhere-but-lets-juststop-and-think/
NOSQL Intro with MongoDB and
Cassandra

6


•

Why are database like Mongo or Cassandra
needed?
To understand one needs to look at
• the history of databases
• How systems were built in the past

•

Then examine modern applications
• Web scale
• Data acquisition

•

Other factors like cost of H/W
NOSQL Intro with MongoDB and
Cassandra

7
•

•
•

•
•

•

1960’s – Hierarchical and Network type (IMS and
CODASYL)
1970’s – Beginnings of theory behind relational model. Codd
1980’s – Rise of the relational model. SQL. E/R Model
(Chen)
1990’s – Access/Excel and MySQL. ODMS began to appear
2000;’s – Two forces; large enterprise and open source.
Google and Amazon. CAP Theorem (more on that to
come…)
2010’s – Immergence of NoSQL as an industry player and
viable alternative

NOSQL Intro with MongoDB and
Cassandra

8
•

Developers today are faced with Internet scale

100,000’s of users
Low cost of storage
Increased processing power
Ability to capture (and need) of millions of events. Caching
solves it to an extent but brings other complexities
• Real-time
• Need to scale out and not up. (add infinite number of low
cost machines vs. replace with a more powerful machine).
•
•
•
•

•

Cost

• Let’s not forget for enterprise DB’s Internet scale can become

expensive
• Open source DB’s may solve license cost, but don’t ignore
operational costs
NOSQL Intro with MongoDB and
Cassandra

9


Some facts from
http://www.storagenewsletter.com/rubriques/m
arket-reportsresearch/ibm-cmo-study/

Approximately 90 percent of all the real-time
information being created today is unstructured data
Every day we create 2.5 quintillion (10 to the 18th)
bytes of data (this is 30 zeroes!!)

90 percent of the world's data today has been
created in the last two years alone
NOSQL Intro with MongoDB and
Cassandra

10
•

Relational
• Divide into tables, relate into foreign keys, DB constraints,
normalized data, the Interface is SQL

•

NoSQL
• Store in schemaless format, redundancy encouraged,
application access determines the storage format (your
queries).Interface varies and is optimized for the
implementation, no forced DB constraints.

NOSQL Intro with MongoDB and
Cassandra

11
Luckily, due to the large number of compromises made
when attempting to scale their existing relational
databases, these tradeoffs were not so
foreign or distasteful as they might have been.



Greg Burd https://www.usenix.org/legacy/publications
/login/2011-10/openpdfs/Burd.pdf
NOSQL Intro with MongoDB and
Cassandra

12






Eventual consistency
Application has increased responsibility such
as maintain consistency & handle transactions
Store redundant data

NOSQL Intro with MongoDB and
Cassandra

13
Driving force in requiring new technology is often
referred to as the “3 V’s”.
•
•
•

Volume – amount of data
Variety – range of data types and sources
Velocity – speed of data in and out

NOSQL Intro with MongoDB and
Cassandra

14
NoSQL != Big Data




NoSQL products were created to help solve the big
data problem.
Big data is a much larger problem than just
storage. Analysis tools like Hadoop, messaging
systems like Kafka, real time processing engines
like Storm and machine learning (Mahout) all help
solve the big data problem.
NOSQL Intro with MongoDB and
Cassandra

15
Document DB





Wide Column– Column Family





Cassandra, HBASE, Amazon SimpleDB

Key Value



•

Riak, Redis, DynamoDB, Voldemort, MemcacheDB

Graph



•

Neo4J, OrientDB

Search (search can also be a persistence store)



•


MongoDB, CouchDB,

Lucene, Solr, ElasticSearch

Many many many, many more! (http://nosql-database.org/)
NOSQL Intro with MongoDB and
Cassandra

16




Choosing the right NoSQL type and eventual product
depends on…
Type of Data
•
•
•
•
•
•
•
•





One key and a lot of data?
Schema variance
High volume of data?
Storing, media, blobs,
Document oriented?
Tracking relationships?
Combination?
Multi-Datacenter

Type of Access
Volumes of Data (there is big data and there is BIG DATA)
Need/want support/services/training
NOSQL Intro with MongoDB and
Cassandra

17
•

ACID

•

CAP Theorem

•

BASE

NOSQL Intro with MongoDB and
Cassandra

18
PROBABLY HAVE HEARD OF ACID
•
Atomic – All or None
•

Consistency – What is written is valid

•

Isolation – One operation at a time

•

Durability – Once committed to the DB, it stays

This is the world we have lived in for a long time…
NOSQL Intro with MongoDB and
Cassandra

19




Many may have heard this one
CAP stands for Consistency, Availability and
Partition Tolerance

• Consistency –like the C in ACID. Operation is all or nothing,

• Availability – service is available.
• Partition Tolerance – No failure other than complete network

failure causes system not to respond



** http://www.cs.berkeley.edu/~brewer/cs262b2004/PODC-keynote.pdf
NOSQL Intro with MongoDB and
Cassandra

20
In Mongo terms you can have 2 of 3. Availability, Partition-Tolerance
or Eventual Consistency.

NOSQL Intro with MongoDB and
Cassandra

21
NOSQL Intro with MongoDB and
Cassandra

22
•

So we are talking about large amounts of data

•

High velocity of acquisition

•

A lot of variety that we need to store. Will
worry about it later how to handle (or not)

•

Need to scale and not break the bank

•

Want the database to support agile, not hinder
NOSQL Intro with MongoDB and
Cassandra

23
•

Maybe consider going relational if
• Highly transactional (FoundationDB?)
• Business Intelligence Systems (Hadoop may make this not
true)
• Don’t be fooled by fear of losing ACID….
http://highscalability.com/blog/2013/5/1/myth-eric-brewer-onwhy-banks-are-base-not-acid-availability.html

NOSQL Intro with MongoDB and
Cassandra

24
And now
let’s look at MongoDB

NOSQL Intro with MongoDB and
Cassandra

25
http://db-engines.com/en/ranking_definition

NOSQL Intro with MongoDB and
Cassandra

26
Few

•
•
•
•
•
•

high level points

Document Oriented
Storage format is JSON (actually BSON)
Replication built in
Master / slave architecture
Strong querying support
Name from "humongous"
NOSQL Intro with MongoDB and
Cassandra

27
•

Open Source

•

Schemaless

•

Scalable

•

Document Level Atomicity

•

Easy Installation

•

Relatively Ease Of Use

•

Great (!!!!) Documentation
NOSQL Intro with MongoDB and
Cassandra

28
•

No cross document transactions

•

No joins

•

Replication – master / slave

•

Sharding

NOSQL Intro with MongoDB and
Cassandra

29


-

* Credit – Dwight Merriman, Founder and CEO – MongoDB (was 10Gen)

NOSQL Intro with MongoDB and
Cassandra

30


Master Slave and Secondary Reads

** http://docs.mongodb.org/manual/core/replication-introduction/

NOSQL Intro with MongoDB and
Cassandra

31


Primary






Receives all write requests
Replica set can only have on primary
Mongo stored all changes in oplog

Secondary
Replicates primary oplog
 Clients can prefer to read from secondaries
 If primary goes down a new primary is elected (after
10 seconds no response)


NOSQL Intro with MongoDB and
Cassandra

32


http://docs.mongodb.org/manual/core/sharding-introduction/

NOSQL Intro with MongoDB and
Cassandra

33


Shards




Store the data, normally in production each shard is
a replica set

Routers


Routes client operations to shards based on shard
key, can have more than one for availability
 Shard key is range based or hashed



Config Servers



Contains cluster metadata
Production there are 3 config servers
NOSQL Intro with MongoDB and
Cassandra

34


•

•



At its simplest form, Mongo is a document oriented database

MongoDB stores all data in documents, which are
JSON-style data structures composed of field-andvalue pairs.
MongoDB stores documents on disk in the BSON
serialization format. BSON is a binary representation of
JSON documents. BSON contains more data types than
does JSON.
** For in-depth BSON information, see bsonspec.org.
NOSQL Intro with MongoDB and
Cassandra

35








{
"_id" :
"52a602280f2e642811ce8478",
"ratingCode" : "PG13",
"country" : "USA",
"entityType" : "Rating”
}

NOSQL Intro with MongoDB and
Cassandra

36
NOSQL Intro with MongoDB and
Cassandra

37










Documents have the following rules:
The maximum BSON document size is 16
megabytes.
The field name _id is reserved for use as a
primary key; its value must be unique in the
collection.
The field names cannot start with the $
character.
The field names cannot contain the . character.
NOSQL Intro with MongoDB and
Cassandra

38



Windows
http://docs.mongodb.org/manual/tutorial/installmongodb-on-windows/



MAC
http://docs.mongodb.org/manual/tutorial/installmongodb-on-os-x/



Create Data Directory , Defaults



• C:datadb
• /data/db/ (make sure have permissions)




Or can set using -dbpath
C:mongodbbinmongod.exe --dbpath
d:testmongodbdata
NOSQL Intro with MongoDB and
Cassandra

39




Database
mongod

Shell
mongo
show dbs
show collections
db.stats()
NOSQL Intro with MongoDB and
Cassandra

40


1_simpleinsert.txt
 Insert
 Find
 Find all
 Find One
 Find with criteria
 Indexes
 Explain()
NOSQL Intro with MongoDB and
Cassandra

41


2_arrays_sort.txt
• Embedded documents
• Limit, Sort
• Using regex in query
• Removing documents
• Drop collection
NOSQL Intro with MongoDB and
Cassandra

42




3_imp_exp.txt
Mongo provides tools for getting data in and
out of the database
• Data Can Be Exported to json files

• Json files can then be Imported

NOSQL Intro with MongoDB and
Cassandra

43


4_cond_ops.txt
•
•
•
•
•

$lt
$gt
$gte
$lte
$or

• Also $not, $exists, $type, $in


(for $type refer to
http://docs.mongodb.org/manual/reference/ope
rator/query/type/#_S_type )
NOSQL Intro with MongoDB and
Cassandra

44


Aggregation Framework




Uses a pipeline model to perform a series of operations
on data. Common is a match phase (selection) and then
grouping (create result)

Map Reduce


Two phases
 Map that creates one or more documents from each input

document
 Reduce phase that combines output from Map into some
result
 Finalize – optional that can perform some logic (e.g. sorting)
on reduce output
NOSQL Intro with MongoDB and
Cassandra

45


5_admin.txt
• how dbs
• show collections
• db.stats()

• db.posts.stats()
• db.posts.drop()
• db.system.indexes.find()

NOSQL Intro with MongoDB and
Cassandra

46
•
•
•
•
•

Remember with NoSql redundancy is not evil
Applications insure consistency, not the DB
Application join data, not defined in the DB
Datamodel is schema-less
Datamodel is built to support queries usually

NOSQL Intro with MongoDB and
Cassandra

47
•

Your basic units of data (what would be a document)?

•

How are these units grouped / related?

•

•

How does Mongo let you query this data, what are the
options?
Finally, maybe most importantly, what are your
applications access patterns?
•
•
•
•
•

Reads vs. writes
Queries
Updates
Deletions
How structured is it

NOSQL Intro with MongoDB and
Cassandra

48


Normalized
• Similar to relational model.
• One collection per entity type
• Little or no redundancy
• Allows clean updates, familiar to many SQL users,

easier to understand

NOSQL Intro with MongoDB and
Cassandra

49
NOSQL Intro with MongoDB and
Cassandra

50
•

From parent to child
{
name: "O'Reilly Media",
books: [12346789, 234567890, ...]
}

•

From child to parent
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
publisher_id: "oreilly"
}
NOSQL Intro with MongoDB and
Cassandra

51


•

•

•

Often used pattern in Mongo is to embed
information as subdocuments.
Used when there is a contains relationship
Easier querying (when related data is often
used together)
Need to keep 16 MB document size in mind
NOSQL Intro with MongoDB and
Cassandra

52
NOSQL Intro with MongoDB and
Cassandra

53

•

Many or few collections
Many Collections
•
•
•
•

•

As seen in normalized
Clean and little redundancy
May not provide best performance
May require frequent updates to application if new types added

Multiple Collections

• Middle ground, partially normalized

•

Not many collections

• One large generic collection
• Contains many types
• Use type field
NOSQL Intro with MongoDB and
Cassandra

54
•

•

Document Growth – will relocate if exceeds allocated
size
Atomicity

• Atomic at document level
• Consideration for insertions, remove and multi-document updates



Sharding – collections distributed across mongod instances,
uses a shard key.



Indexes – index fields often queries, indexes affect write
performance slightly



Consider using TTL to automatically expire documents
NOSQL Intro with MongoDB and
Cassandra

55


CMS Systems



Log Collection


https://code.google.com/p/log4mongo/



Caching



Queues / Messaging


Capped Collections - fixed-size collections that support high-throughput
operations that insert, retrieve, and delete documents based on insertion
order.



Analytics



Prototyping
NOSQL Intro with MongoDB and
Cassandra

56
Mongo Driver
Supplied by MongoDB Itself
Easy to setup
Housed on maven repo

Morphia
Uses App Model
Handles References Well

Spring Mongo
Great if using Spring already
NOSQL Intro with MongoDB and
Cassandra

57


Node
Javascript (JSON), Coffeescript
MEAN Stack






Scala



Casbah
Reactive Mongo

NOSQL Intro with MongoDB and
Cassandra

58


Get MEAN



Mongo, Express, Angular and Node






http://bitnami.com/stack/mean
http://mean.io

Can install, in a VM or even in the cloud
NOSQL Intro with MongoDB and
Cassandra

59








Database in the cloud
https://mongolab.com/

Can access using shell, GUI Mongo explorer,
mongoimport, mongoexport and use in
application
Amazon, Rackspace, Joyent or Azure
NOSQL Intro with MongoDB and
Cassandra

60
MongoDB: The Definitive Guide, 2nd Edition
By: Kristina Chodorow
Publisher: O'Reilly Media, Inc.
Pub. Date: May 23, 2013
Print ISBN-13: 978-1-4493-4468-9
Pages in Print Edition: 432
MongoDB in Action
By: Kyle Banker
Publisher: Manning Publications
Pub. Date: December 16, 2011
Print ISBN-10: 1-935182-87-0
Print ISBN-13: 978-1-935182-87-0
Pages in Print Edition: 312
The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing
By Eelco Plugge; Peter Membrey; Tim Hawkins
Apress, September 2010
ISBN: 9781430230519
327 pages

NOSQL Intro with MongoDB and
Cassandra

61
MongoDB Applied Design Patterns
By: Rick Copeland
Publisher: O'Reilly Media, Inc.
Pub. Date: March 18, 2013
Print ISBN-13: 978-1-4493-4004-9
Pages in Print Edition: 176
MongoDB for Web Development (rough cut!)
By: Mitch Pirtle
Publisher: Addison-Wesley Professional
Last Updated: 14-JUN-2013
Pub. Date: March 11, 2015 (Estimated)
Print ISBN-10: 0-321-70533-5
Print ISBN-13: 978-0-321-70533-4
Pages in Print Edition: 360
Instant MongoDB
By: Amol Nayak;
Publisher: Packt Publishing
Pub. Date: July 26, 2013
Print ISBN-13: 978-1-78216-970-3
Pages in Print Edition: 72
NOSQL Intro with MongoDB and
Cassandra

62
•
•
•
•
•

•

http://www.mongodb.org/
https://mongolab.com/welcome/
https://education.mongodb.com/
http://blog.mongodb.org/
http://stackoverflow.com/questions/tagged/
mongodb
http://bitnami.com/stack/mean

NOSQL Intro with MongoDB and
Cassandra

63
Let’s look briefly at Cassandra as an
alternative to Mongo

NOSQL Intro with MongoDB and
Cassandra

64
•

Developed At Facebook, based on Google Big Table and
Amazon Dynamo **

•

Open Sourced in mid 2008

•

Apache Project March 2009

•

•

•

Commercial Support through Datastax (originally known as
Riptano, founded 2010)
Used at Netflix, eBay and many more. Reportedly 300 TB
on 400 machines largest installation
Current version is 2.0.3
NOSQL Intro with MongoDB and
Cassandra

65
•

No Single Point of Failure – highly available.
• Peer to Peer – no master

•
•
•
•
•
•
•
•

Data Center Aware – distributed architecture
Linear Scaling – just add hardware
Eventual Consistency, tunable tradeoff between
latency and consistency
Architecture is optimized for writes.
Can have 2 billion columns (cells)!
Data modeling for reads. Design starts with looking at
your queries. (sound familiar?)
With CQL became more SQL-Like, but no joins, no
subqueries, limited ordering (but very useful)
Column Names can part of data, e.g. Time Series
NOSQL Intro with MongoDB and
Cassandra

66






** Important Term **
Quorum : Q = N / 2 + 1.
We get consistency in a BASE world by satisfying W + R >
N
3 obvious ways:
1. W = 1, R = N
2. W = N, R = 1

3. W = Q, R = Q

(N is replication factor, R = read replica count, W = write replica count)
NOSQL Intro with MongoDB and
Cassandra

67


C* data model is made of these:


Column – a name, a value and a timestamp. Applications
can use the name as the data and not use value. (RDBMS like a
column).

Row – a collection of columns identified by a unique key.
Key is called a partition key (RDBMS like a row).
 Column Family – container for an ordered collection
rows. Each row is an ordered collection of columns.
Each column has a key and maybe a value. (RDBMS like a table).
This is also known as a table now in C* terms.
 Keyspace – administrative container for CF’s. It is a
namespace. Also has a replication strategy – more late.


(RDBMS like a DB or schema).

NOSQL Intro with MongoDB and
Cassandra

68
NOSQL Intro with MongoDB and
Cassandra

69





Tokens – partitioner dependent element on the ring.
Each node has a single unique token assigned.
Each node claims a range of tokens that is from its token to
token of the previous node on the ring.

Use this formula
Initial_Token= Zero_Indexed_Node_Number * ((2^127) /
Number_Of_Nodes)
 In cassandra.yaml
initial token=42535295865117307932921825928971026432
 ** http://blog.milford.io/cassandra-token-calculator/


NOSQL Intro with MongoDB and
Cassandra

70
•

•

Replication is how many copies of each piece of
data that should be stored. In C* terms it is
Replication Factor or “RF”.
In C* RF is set at the keyspace level:

CREATE KEYSPACE drg_compare WITH replication = {'class':'SimpleStrategy',
'replication_factor':3};

•

How the data is replicated is called the
Replication Strategy
• SimpleStrategy – returns nodes “next” to each other on

ring, Assumes single DC
• NetworkTopologyStrategy – for configuring per data
center. Rack and DC’s aware.
update keyspace UserProfile with strategy_options=[{DC1:3, DC2:3}];
NOSQL Intro with MongoDB and
Cassandra

71
NOSQL Intro with MongoDB and
Cassandra

72


Using token generation values from before. 4 node cluster.
Write value with token 32535295865117307932921825928971026432

NOSQL Intro with MongoDB and
Cassandra

73
NOSQL Intro with MongoDB and
Cassandra

74
•
•

•

When writing, Coordinator Node will be selected. Selected
at write (or read) time. Not a SPF!
Using Gossip Protocol nodes share information with each
other. Who is up, who is down, who is taking which token
ranges, etc. Every second, each node shares with 1 to 3
nodes.

Consistency Level (CL) – says how many nodes must agree
before an operation is a success. Set at read or write
operation.
• ONE – coordinator will wait for one node to ack write (also TWO,

THREE). One is default if none provided.
• QUORUM – we saw that before. N / 2 + 1. LOCAL_QUORUM,
EACH_QUORUM
• ANY – waits for some replicate. If all down, still succeeds. Only for
writes. Doesn’t guarantee it can be read.
• ALL– Blocks waiting for all replicas
NOSQL Intro with MongoDB and
Cassandra

75







3 important concepts:
Read Repair - At time of read, inconsistencies are noticed
between nodes and replicas are updated. Direct and
background. Direct is determined by CL.
Anti-Entropy Node Repair - For data that is not read
frequently, or to update data on a node that has been down
for a while, the nodetool repair process (also called antientropy repair). Builds Merkle trees, compares nodes and
does repair.
Hinted Handoff - Writes are always sent to all replicas for
the specified row regardless of the consistency level
specified by the client. If a node happens to be down at the
time of write, its corresponding replicas will save hints
about the missed writes, and then handoff the affected rows
once the node comes back online. This notification happens
is via Gossip. Default 1 hour.
NOSQL Intro with MongoDB and
Cassandra

76
•

•

Interaction with Cassandra can be done using one of
supplied clients such as CLI or CQL. Otherwise client
applications are built using a language client library.
Many clients in multiple languages. Including Java,
.NET, Python, Scala, Go, PHP, Node.js, Perl, Ruby, etc.
• Java:
• Hector wraps the underlying Thrift API. Hector is one of the most
commonly used client libraries.
• Astyanax is a client library developed by Netflix .
• Datastax CQL – newest CQL driver, will be very familiar to JDBC
developers
• And many more … (JPA)

•

Also exists Datastax OPSCenter and other various
GUI’s and REST API (Virgil)
NOSQL Intro with MongoDB and
Cassandra

77


Many More Topics / Information Related to C*
not covered



Great for Fast Writes



No Single POF



Data Center Aware



Also Relative Ease Of Use
NOSQL Intro with MongoDB and
Cassandra

78


Questions?



Comments?

Thank You!!!!!!
 brian.enochson@gmail.com


NOSQL Intro with MongoDB and
Cassandra

79

More Related Content

What's hot

Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureMicrosoft
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...Alfresco Software
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM Analytics
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteAmr Awadallah
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperVasu S
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsSnapLogic
 
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverSeeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverInside Analysis
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 

What's hot (20)

Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...
 
Azure data stack_2019_08
Azure data stack_2019_08Azure data stack_2019_08
Azure data stack_2019_08
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | Whitepaper
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverSeeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing Forever
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 

Viewers also liked

Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in CassandraEric Evans
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
Time Series Data Storage in MongoDB
Time Series Data Storage in MongoDBTime Series Data Storage in MongoDB
Time Series Data Storage in MongoDBsky_jackson
 
SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?Neil Saunders
 
Con8862 no sql, json and time series data
Con8862   no sql, json and time series dataCon8862   no sql, json and time series data
Con8862 no sql, json and time series dataAnuj Sahni
 
MongoDB to Cassandra
MongoDB to CassandraMongoDB to Cassandra
MongoDB to Cassandrafredvdd
 
Business of iot_mongodb_spark
Business of iot_mongodb_sparkBusiness of iot_mongodb_spark
Business of iot_mongodb_sparkMat Keep
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword Haitham El-Ghareeb
 
MongoDB IoT CITY Tour EINDHOVEN: Bosch & Tech Mahindra: Industrial Internet, ...
MongoDB IoT CITY Tour EINDHOVEN: Bosch & Tech Mahindra: Industrial Internet, ...MongoDB IoT CITY Tour EINDHOVEN: Bosch & Tech Mahindra: Industrial Internet, ...
MongoDB IoT CITY Tour EINDHOVEN: Bosch & Tech Mahindra: Industrial Internet, ...MongoDB
 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLBasho Technologies
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Shakespeare Revealed
Shakespeare Revealed Shakespeare Revealed
Shakespeare Revealed rwakefor
 

Viewers also liked (20)

Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in Cassandra
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
Time Series Data Storage in MongoDB
Time Series Data Storage in MongoDBTime Series Data Storage in MongoDB
Time Series Data Storage in MongoDB
 
SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?
 
Con8862 no sql, json and time series data
Con8862   no sql, json and time series dataCon8862   no sql, json and time series data
Con8862 no sql, json and time series data
 
MongoDB to Cassandra
MongoDB to CassandraMongoDB to Cassandra
MongoDB to Cassandra
 
Business of iot_mongodb_spark
Business of iot_mongodb_sparkBusiness of iot_mongodb_spark
Business of iot_mongodb_spark
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
MongoDB 3.4 webinar
MongoDB 3.4 webinarMongoDB 3.4 webinar
MongoDB 3.4 webinar
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
MongoDB IoT CITY Tour EINDHOVEN: Bosch & Tech Mahindra: Industrial Internet, ...
MongoDB IoT CITY Tour EINDHOVEN: Bosch & Tech Mahindra: Industrial Internet, ...MongoDB IoT CITY Tour EINDHOVEN: Bosch & Tech Mahindra: Industrial Internet, ...
MongoDB IoT CITY Tour EINDHOVEN: Bosch & Tech Mahindra: Industrial Internet, ...
 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Shakespeare Revealed
Shakespeare Revealed Shakespeare Revealed
Shakespeare Revealed
 

Similar to Big Data, NoSQL with MongoDB and Cassasdra

NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring dataJimmy Ray
 
Introduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQLIntroduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQLMayur Patil
 
MongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data scienceMongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data sciencebitragowthamkumar1
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDBMongoDB
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
Using MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 MinutesUsing MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 MinutesAndrás Fehér
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBMarco Segato
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcriptfoliba
 
Monogo db in-action
Monogo db in-actionMonogo db in-action
Monogo db in-actionChi Lee
 

Similar to Big Data, NoSQL with MongoDB and Cassasdra (20)

NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQLIntroduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQL
 
MongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data scienceMongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data science
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Mongo db
Mongo dbMongo db
Mongo db
 
Using MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 MinutesUsing MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 Minutes
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcript
 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
 
Mongo db
Mongo dbMongo db
Mongo db
 
Monogo db in-action
Monogo db in-actionMonogo db in-action
Monogo db in-action
 

Recently uploaded

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 

Recently uploaded (20)

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 

Big Data, NoSQL with MongoDB and Cassasdra

  • 1. Big Data and NoSQL with MongoDB & Cassandra NOSQL Intro with MongoDB and Cassandra 1
  • 2. - Brian Enochson - SW Engineer who has worked as designer / developer on NOSQL (Mongo, Cassandra, Hadoop) - Specialize in SW Development, architecture and training     Brian Enochson brian.enochson@gmail.com Twitter @benochso Google Plus https://plus.google.com/+BrianEnochson NOSQL Intro with MongoDB and Cassandra 2
  • 3. • • • • • Presentation Intro Introduction to Big Data Introduction to NoSQL Relational Database to NoSQL technology contrast & compare NoSQL landscape NOSQL Intro with MongoDB and Cassandra 3
  • 4. • • • • • • • Introduction to MongoDB MongoDB Components, capabilities and common use cases Json & BsON Documents, collections, references and Mongo ID Querying Data Modeling/Schema Design Replication & Sharding NOSQL Intro with MongoDB and Cassandra 4
  • 5. • • • • • • Cassandra Architecture Data Model Data Modeling Application Development Wrap-up and final Q & A NOSQL Intro with MongoDB and Cassandra 5
  • 7.  • Why are database like Mongo or Cassandra needed? To understand one needs to look at • the history of databases • How systems were built in the past • Then examine modern applications • Web scale • Data acquisition • Other factors like cost of H/W NOSQL Intro with MongoDB and Cassandra 7
  • 8. • • • • • • 1960’s – Hierarchical and Network type (IMS and CODASYL) 1970’s – Beginnings of theory behind relational model. Codd 1980’s – Rise of the relational model. SQL. E/R Model (Chen) 1990’s – Access/Excel and MySQL. ODMS began to appear 2000;’s – Two forces; large enterprise and open source. Google and Amazon. CAP Theorem (more on that to come…) 2010’s – Immergence of NoSQL as an industry player and viable alternative NOSQL Intro with MongoDB and Cassandra 8
  • 9. • Developers today are faced with Internet scale 100,000’s of users Low cost of storage Increased processing power Ability to capture (and need) of millions of events. Caching solves it to an extent but brings other complexities • Real-time • Need to scale out and not up. (add infinite number of low cost machines vs. replace with a more powerful machine). • • • • • Cost • Let’s not forget for enterprise DB’s Internet scale can become expensive • Open source DB’s may solve license cost, but don’t ignore operational costs NOSQL Intro with MongoDB and Cassandra 9
  • 10.  Some facts from http://www.storagenewsletter.com/rubriques/m arket-reportsresearch/ibm-cmo-study/ Approximately 90 percent of all the real-time information being created today is unstructured data Every day we create 2.5 quintillion (10 to the 18th) bytes of data (this is 30 zeroes!!) 90 percent of the world's data today has been created in the last two years alone NOSQL Intro with MongoDB and Cassandra 10
  • 11. • Relational • Divide into tables, relate into foreign keys, DB constraints, normalized data, the Interface is SQL • NoSQL • Store in schemaless format, redundancy encouraged, application access determines the storage format (your queries).Interface varies and is optimized for the implementation, no forced DB constraints. NOSQL Intro with MongoDB and Cassandra 11
  • 12. Luckily, due to the large number of compromises made when attempting to scale their existing relational databases, these tradeoffs were not so foreign or distasteful as they might have been.  Greg Burd https://www.usenix.org/legacy/publications /login/2011-10/openpdfs/Burd.pdf NOSQL Intro with MongoDB and Cassandra 12
  • 13.    Eventual consistency Application has increased responsibility such as maintain consistency & handle transactions Store redundant data NOSQL Intro with MongoDB and Cassandra 13
  • 14. Driving force in requiring new technology is often referred to as the “3 V’s”. • • • Volume – amount of data Variety – range of data types and sources Velocity – speed of data in and out NOSQL Intro with MongoDB and Cassandra 14
  • 15. NoSQL != Big Data   NoSQL products were created to help solve the big data problem. Big data is a much larger problem than just storage. Analysis tools like Hadoop, messaging systems like Kafka, real time processing engines like Storm and machine learning (Mahout) all help solve the big data problem. NOSQL Intro with MongoDB and Cassandra 15
  • 16. Document DB   Wide Column– Column Family   Cassandra, HBASE, Amazon SimpleDB Key Value  • Riak, Redis, DynamoDB, Voldemort, MemcacheDB Graph  • Neo4J, OrientDB Search (search can also be a persistence store)  •  MongoDB, CouchDB, Lucene, Solr, ElasticSearch Many many many, many more! (http://nosql-database.org/) NOSQL Intro with MongoDB and Cassandra 16
  • 17.   Choosing the right NoSQL type and eventual product depends on… Type of Data • • • • • • • •    One key and a lot of data? Schema variance High volume of data? Storing, media, blobs, Document oriented? Tracking relationships? Combination? Multi-Datacenter Type of Access Volumes of Data (there is big data and there is BIG DATA) Need/want support/services/training NOSQL Intro with MongoDB and Cassandra 17
  • 18. • ACID • CAP Theorem • BASE NOSQL Intro with MongoDB and Cassandra 18
  • 19. PROBABLY HAVE HEARD OF ACID • Atomic – All or None • Consistency – What is written is valid • Isolation – One operation at a time • Durability – Once committed to the DB, it stays This is the world we have lived in for a long time… NOSQL Intro with MongoDB and Cassandra 19
  • 20.   Many may have heard this one CAP stands for Consistency, Availability and Partition Tolerance • Consistency –like the C in ACID. Operation is all or nothing, • Availability – service is available. • Partition Tolerance – No failure other than complete network failure causes system not to respond  ** http://www.cs.berkeley.edu/~brewer/cs262b2004/PODC-keynote.pdf NOSQL Intro with MongoDB and Cassandra 20
  • 21. In Mongo terms you can have 2 of 3. Availability, Partition-Tolerance or Eventual Consistency. NOSQL Intro with MongoDB and Cassandra 21
  • 22. NOSQL Intro with MongoDB and Cassandra 22
  • 23. • So we are talking about large amounts of data • High velocity of acquisition • A lot of variety that we need to store. Will worry about it later how to handle (or not) • Need to scale and not break the bank • Want the database to support agile, not hinder NOSQL Intro with MongoDB and Cassandra 23
  • 24. • Maybe consider going relational if • Highly transactional (FoundationDB?) • Business Intelligence Systems (Hadoop may make this not true) • Don’t be fooled by fear of losing ACID…. http://highscalability.com/blog/2013/5/1/myth-eric-brewer-onwhy-banks-are-base-not-acid-availability.html NOSQL Intro with MongoDB and Cassandra 24
  • 25. And now let’s look at MongoDB NOSQL Intro with MongoDB and Cassandra 25
  • 27. Few • • • • • • high level points Document Oriented Storage format is JSON (actually BSON) Replication built in Master / slave architecture Strong querying support Name from "humongous" NOSQL Intro with MongoDB and Cassandra 27
  • 28. • Open Source • Schemaless • Scalable • Document Level Atomicity • Easy Installation • Relatively Ease Of Use • Great (!!!!) Documentation NOSQL Intro with MongoDB and Cassandra 28
  • 29. • No cross document transactions • No joins • Replication – master / slave • Sharding NOSQL Intro with MongoDB and Cassandra 29
  • 30.  - * Credit – Dwight Merriman, Founder and CEO – MongoDB (was 10Gen) NOSQL Intro with MongoDB and Cassandra 30
  • 31.  Master Slave and Secondary Reads ** http://docs.mongodb.org/manual/core/replication-introduction/ NOSQL Intro with MongoDB and Cassandra 31
  • 32.  Primary     Receives all write requests Replica set can only have on primary Mongo stored all changes in oplog Secondary Replicates primary oplog  Clients can prefer to read from secondaries  If primary goes down a new primary is elected (after 10 seconds no response)  NOSQL Intro with MongoDB and Cassandra 32
  • 34.  Shards   Store the data, normally in production each shard is a replica set Routers  Routes client operations to shards based on shard key, can have more than one for availability  Shard key is range based or hashed  Config Servers   Contains cluster metadata Production there are 3 config servers NOSQL Intro with MongoDB and Cassandra 34
  • 35.  • •  At its simplest form, Mongo is a document oriented database MongoDB stores all data in documents, which are JSON-style data structures composed of field-andvalue pairs. MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents. BSON contains more data types than does JSON. ** For in-depth BSON information, see bsonspec.org. NOSQL Intro with MongoDB and Cassandra 35
  • 36.       { "_id" : "52a602280f2e642811ce8478", "ratingCode" : "PG13", "country" : "USA", "entityType" : "Rating” } NOSQL Intro with MongoDB and Cassandra 36
  • 37. NOSQL Intro with MongoDB and Cassandra 37
  • 38.      Documents have the following rules: The maximum BSON document size is 16 megabytes. The field name _id is reserved for use as a primary key; its value must be unique in the collection. The field names cannot start with the $ character. The field names cannot contain the . character. NOSQL Intro with MongoDB and Cassandra 38
  • 39.   Windows http://docs.mongodb.org/manual/tutorial/installmongodb-on-windows/  MAC http://docs.mongodb.org/manual/tutorial/installmongodb-on-os-x/  Create Data Directory , Defaults  • C:datadb • /data/db/ (make sure have permissions)   Or can set using -dbpath C:mongodbbinmongod.exe --dbpath d:testmongodbdata NOSQL Intro with MongoDB and Cassandra 39
  • 41.  1_simpleinsert.txt  Insert  Find  Find all  Find One  Find with criteria  Indexes  Explain() NOSQL Intro with MongoDB and Cassandra 41
  • 42.  2_arrays_sort.txt • Embedded documents • Limit, Sort • Using regex in query • Removing documents • Drop collection NOSQL Intro with MongoDB and Cassandra 42
  • 43.   3_imp_exp.txt Mongo provides tools for getting data in and out of the database • Data Can Be Exported to json files • Json files can then be Imported NOSQL Intro with MongoDB and Cassandra 43
  • 44.  4_cond_ops.txt • • • • • $lt $gt $gte $lte $or • Also $not, $exists, $type, $in  (for $type refer to http://docs.mongodb.org/manual/reference/ope rator/query/type/#_S_type ) NOSQL Intro with MongoDB and Cassandra 44
  • 45.  Aggregation Framework   Uses a pipeline model to perform a series of operations on data. Common is a match phase (selection) and then grouping (create result) Map Reduce  Two phases  Map that creates one or more documents from each input document  Reduce phase that combines output from Map into some result  Finalize – optional that can perform some logic (e.g. sorting) on reduce output NOSQL Intro with MongoDB and Cassandra 45
  • 46.  5_admin.txt • how dbs • show collections • db.stats() • db.posts.stats() • db.posts.drop() • db.system.indexes.find() NOSQL Intro with MongoDB and Cassandra 46
  • 47. • • • • • Remember with NoSql redundancy is not evil Applications insure consistency, not the DB Application join data, not defined in the DB Datamodel is schema-less Datamodel is built to support queries usually NOSQL Intro with MongoDB and Cassandra 47
  • 48. • Your basic units of data (what would be a document)? • How are these units grouped / related? • • How does Mongo let you query this data, what are the options? Finally, maybe most importantly, what are your applications access patterns? • • • • • Reads vs. writes Queries Updates Deletions How structured is it NOSQL Intro with MongoDB and Cassandra 48
  • 49.  Normalized • Similar to relational model. • One collection per entity type • Little or no redundancy • Allows clean updates, familiar to many SQL users, easier to understand NOSQL Intro with MongoDB and Cassandra 49
  • 50. NOSQL Intro with MongoDB and Cassandra 50
  • 51. • From parent to child { name: "O'Reilly Media", books: [12346789, 234567890, ...] } • From child to parent { _id: 123456789, title: "MongoDB: The Definitive Guide", publisher_id: "oreilly" } NOSQL Intro with MongoDB and Cassandra 51
  • 52.  • • • Often used pattern in Mongo is to embed information as subdocuments. Used when there is a contains relationship Easier querying (when related data is often used together) Need to keep 16 MB document size in mind NOSQL Intro with MongoDB and Cassandra 52
  • 53. NOSQL Intro with MongoDB and Cassandra 53
  • 54.  • Many or few collections Many Collections • • • • • As seen in normalized Clean and little redundancy May not provide best performance May require frequent updates to application if new types added Multiple Collections • Middle ground, partially normalized • Not many collections • One large generic collection • Contains many types • Use type field NOSQL Intro with MongoDB and Cassandra 54
  • 55. • • Document Growth – will relocate if exceeds allocated size Atomicity • Atomic at document level • Consideration for insertions, remove and multi-document updates  Sharding – collections distributed across mongod instances, uses a shard key.  Indexes – index fields often queries, indexes affect write performance slightly  Consider using TTL to automatically expire documents NOSQL Intro with MongoDB and Cassandra 55
  • 56.  CMS Systems  Log Collection  https://code.google.com/p/log4mongo/  Caching  Queues / Messaging  Capped Collections - fixed-size collections that support high-throughput operations that insert, retrieve, and delete documents based on insertion order.  Analytics  Prototyping NOSQL Intro with MongoDB and Cassandra 56
  • 57. Mongo Driver Supplied by MongoDB Itself Easy to setup Housed on maven repo Morphia Uses App Model Handles References Well Spring Mongo Great if using Spring already NOSQL Intro with MongoDB and Cassandra 57
  • 58.  Node Javascript (JSON), Coffeescript MEAN Stack    Scala   Casbah Reactive Mongo NOSQL Intro with MongoDB and Cassandra 58
  • 59.  Get MEAN  Mongo, Express, Angular and Node    http://bitnami.com/stack/mean http://mean.io Can install, in a VM or even in the cloud NOSQL Intro with MongoDB and Cassandra 59
  • 60.     Database in the cloud https://mongolab.com/ Can access using shell, GUI Mongo explorer, mongoimport, mongoexport and use in application Amazon, Rackspace, Joyent or Azure NOSQL Intro with MongoDB and Cassandra 60
  • 61. MongoDB: The Definitive Guide, 2nd Edition By: Kristina Chodorow Publisher: O'Reilly Media, Inc. Pub. Date: May 23, 2013 Print ISBN-13: 978-1-4493-4468-9 Pages in Print Edition: 432 MongoDB in Action By: Kyle Banker Publisher: Manning Publications Pub. Date: December 16, 2011 Print ISBN-10: 1-935182-87-0 Print ISBN-13: 978-1-935182-87-0 Pages in Print Edition: 312 The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing By Eelco Plugge; Peter Membrey; Tim Hawkins Apress, September 2010 ISBN: 9781430230519 327 pages NOSQL Intro with MongoDB and Cassandra 61
  • 62. MongoDB Applied Design Patterns By: Rick Copeland Publisher: O'Reilly Media, Inc. Pub. Date: March 18, 2013 Print ISBN-13: 978-1-4493-4004-9 Pages in Print Edition: 176 MongoDB for Web Development (rough cut!) By: Mitch Pirtle Publisher: Addison-Wesley Professional Last Updated: 14-JUN-2013 Pub. Date: March 11, 2015 (Estimated) Print ISBN-10: 0-321-70533-5 Print ISBN-13: 978-0-321-70533-4 Pages in Print Edition: 360 Instant MongoDB By: Amol Nayak; Publisher: Packt Publishing Pub. Date: July 26, 2013 Print ISBN-13: 978-1-78216-970-3 Pages in Print Edition: 72 NOSQL Intro with MongoDB and Cassandra 62
  • 64. Let’s look briefly at Cassandra as an alternative to Mongo NOSQL Intro with MongoDB and Cassandra 64
  • 65. • Developed At Facebook, based on Google Big Table and Amazon Dynamo ** • Open Sourced in mid 2008 • Apache Project March 2009 • • • Commercial Support through Datastax (originally known as Riptano, founded 2010) Used at Netflix, eBay and many more. Reportedly 300 TB on 400 machines largest installation Current version is 2.0.3 NOSQL Intro with MongoDB and Cassandra 65
  • 66. • No Single Point of Failure – highly available. • Peer to Peer – no master • • • • • • • • Data Center Aware – distributed architecture Linear Scaling – just add hardware Eventual Consistency, tunable tradeoff between latency and consistency Architecture is optimized for writes. Can have 2 billion columns (cells)! Data modeling for reads. Design starts with looking at your queries. (sound familiar?) With CQL became more SQL-Like, but no joins, no subqueries, limited ordering (but very useful) Column Names can part of data, e.g. Time Series NOSQL Intro with MongoDB and Cassandra 66
  • 67.    ** Important Term ** Quorum : Q = N / 2 + 1. We get consistency in a BASE world by satisfying W + R > N 3 obvious ways: 1. W = 1, R = N 2. W = N, R = 1 3. W = Q, R = Q (N is replication factor, R = read replica count, W = write replica count) NOSQL Intro with MongoDB and Cassandra 67
  • 68.  C* data model is made of these:  Column – a name, a value and a timestamp. Applications can use the name as the data and not use value. (RDBMS like a column). Row – a collection of columns identified by a unique key. Key is called a partition key (RDBMS like a row).  Column Family – container for an ordered collection rows. Each row is an ordered collection of columns. Each column has a key and maybe a value. (RDBMS like a table). This is also known as a table now in C* terms.  Keyspace – administrative container for CF’s. It is a namespace. Also has a replication strategy – more late.  (RDBMS like a DB or schema). NOSQL Intro with MongoDB and Cassandra 68
  • 69. NOSQL Intro with MongoDB and Cassandra 69
  • 70.    Tokens – partitioner dependent element on the ring. Each node has a single unique token assigned. Each node claims a range of tokens that is from its token to token of the previous node on the ring. Use this formula Initial_Token= Zero_Indexed_Node_Number * ((2^127) / Number_Of_Nodes)  In cassandra.yaml initial token=42535295865117307932921825928971026432  ** http://blog.milford.io/cassandra-token-calculator/  NOSQL Intro with MongoDB and Cassandra 70
  • 71. • • Replication is how many copies of each piece of data that should be stored. In C* terms it is Replication Factor or “RF”. In C* RF is set at the keyspace level: CREATE KEYSPACE drg_compare WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; • How the data is replicated is called the Replication Strategy • SimpleStrategy – returns nodes “next” to each other on ring, Assumes single DC • NetworkTopologyStrategy – for configuring per data center. Rack and DC’s aware. update keyspace UserProfile with strategy_options=[{DC1:3, DC2:3}]; NOSQL Intro with MongoDB and Cassandra 71
  • 72. NOSQL Intro with MongoDB and Cassandra 72
  • 73.  Using token generation values from before. 4 node cluster. Write value with token 32535295865117307932921825928971026432 NOSQL Intro with MongoDB and Cassandra 73
  • 74. NOSQL Intro with MongoDB and Cassandra 74
  • 75. • • • When writing, Coordinator Node will be selected. Selected at write (or read) time. Not a SPF! Using Gossip Protocol nodes share information with each other. Who is up, who is down, who is taking which token ranges, etc. Every second, each node shares with 1 to 3 nodes. Consistency Level (CL) – says how many nodes must agree before an operation is a success. Set at read or write operation. • ONE – coordinator will wait for one node to ack write (also TWO, THREE). One is default if none provided. • QUORUM – we saw that before. N / 2 + 1. LOCAL_QUORUM, EACH_QUORUM • ANY – waits for some replicate. If all down, still succeeds. Only for writes. Doesn’t guarantee it can be read. • ALL– Blocks waiting for all replicas NOSQL Intro with MongoDB and Cassandra 75
  • 76.     3 important concepts: Read Repair - At time of read, inconsistencies are noticed between nodes and replicas are updated. Direct and background. Direct is determined by CL. Anti-Entropy Node Repair - For data that is not read frequently, or to update data on a node that has been down for a while, the nodetool repair process (also called antientropy repair). Builds Merkle trees, compares nodes and does repair. Hinted Handoff - Writes are always sent to all replicas for the specified row regardless of the consistency level specified by the client. If a node happens to be down at the time of write, its corresponding replicas will save hints about the missed writes, and then handoff the affected rows once the node comes back online. This notification happens is via Gossip. Default 1 hour. NOSQL Intro with MongoDB and Cassandra 76
  • 77. • • Interaction with Cassandra can be done using one of supplied clients such as CLI or CQL. Otherwise client applications are built using a language client library. Many clients in multiple languages. Including Java, .NET, Python, Scala, Go, PHP, Node.js, Perl, Ruby, etc. • Java: • Hector wraps the underlying Thrift API. Hector is one of the most commonly used client libraries. • Astyanax is a client library developed by Netflix . • Datastax CQL – newest CQL driver, will be very familiar to JDBC developers • And many more … (JPA) • Also exists Datastax OPSCenter and other various GUI’s and REST API (Virgil) NOSQL Intro with MongoDB and Cassandra 77
  • 78.  Many More Topics / Information Related to C* not covered  Great for Fast Writes  No Single POF  Data Center Aware  Also Relative Ease Of Use NOSQL Intro with MongoDB and Cassandra 78