3. 3
Two
big
drivers
for
NoSQL
adop&on
Lack
of
flexibility/
rigid
schemas
Inability
to
scale
out
data
Performance
challenges
Cost
All
of
these
Other
49%
35%
29%
16%
12%
11%
Source:
Couchbase
Survey,
December
2011,
n
=
1351.
6. 6
Document
Databases
• Each
record
in
the
database
is
a
self-‐
describing
document
• Each
document
has
an
independent
structure
• Documents
can
be
complex
• All
databases
require
a
unique
key
• Documents
are
stored
using
JSON
or
XML
or
their
deriva&ves
• Content
can
be
indexed
and
queried
• Offer
auto-‐sharding
for
scaling
and
replica&on
for
high-‐availability
{
“UUID”:
“21f7f8de-‐8051-‐5b89-‐86
“Time”:
“2011-‐04-‐01T13:01:02.42
“Server”:
“A2223E”,
“Calling
Server”:
“A2213W”,
“Type”:
“E100”,
“Initiating
User”:
“dsallings@spy.net”,
“Details”:
{
“IP”:
“10.1.1.22”,
“API”:
“InsertDVDQueueItem”,
“Trace”:
“cleansed”,
“Tags”:
[
“SERVER”,
“US-‐West”,
“API”
]
}
}
9. 9
Rela&onal
vs
Document
data
model
Rela&onal
data
model
Document
data
model
Collec&on
of
complex
documents
with
arbitrary,
nested
data
formats
and
varying
“record”
format.
Highly-‐structured
table
organiza&on
with
rigidly-‐defined
data
formats
and
record
structure.
JSON
JSON
JSON
C1
C2
C3
C4
{
}
10. 10
Example:
User
Profile
Address
Info
1
DEN
30303
CO
2
MV
94040
CA
3
CHI
60609
IL
User
Info
KEY
First
ZIP_id
Last
4
NY
10010
NY
1
Dip&
2
Borkar
2
Joe
2
Smith
3
Ali
2
Dodson
4
John
3
Doe
ZIP_id
CITY
ZIP
STATE
1
2
2
MV
94040
CA
To
get
informa&on
about
specific
user,
you
perform
a
join
across
two
tables
11. 11
All
data
in
a
single
document
Document
Example:
User
Profile
{
“ID”:
1,
“FIRST”:
“Dip&”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”
}
JSON
=
+
12. 12
User
ID
First
Last
Zip
1
Dip&
Borkar
94040
2
Joe
Smith
94040
3
Ali
Dodson
94040
4
Sarah
Gorin
NW1
5
Bob
Young
30303
6
Nancy
Baker
10010
7
Ray
Jones
31311
8
Lee
Chen
V5V3M
•
•
•
50000
Doug
Moore
04252
50001
Mary
White
SW195
50002
Lisa
Clark
12425
Country
ID
TEL
3
001
Country
ID
Country
name
001
USA
002
UK
003
Argen&na
004
Australia
005
Aruba
006
Austria
007
Brazil
008
Canada
009
Chile
•
•
•
130
Portugal
131
Romania
132
Russia
133
Spain
134
Sweden
User
ID
Photo
ID
Comment
2
d043
NYC
2
b054
Bday
5
c036
Miami
7
d072
Sunset
5002
e086
Spain
Photo
Table
001
007
001
133
133
User
ID
Status
ID
Text
1
a42
At
conf
4
b26
excited
5
c32
hockey
12
d83
Go
A’s
5000
e34
sailing
Status
Table
134
007
008
001
005
Country
Table
User
ID
Affl
ID
Affl
Name
2
a42
Cal
4
b96
USC
7
c14
UW
8
e22
Oxford
Affilia&ons
Table
Country
ID
001
001
001
002
Country
ID
Country
ID
001
001
002
001
001
001
008
001
002
001
User
Table
.
.
.
Making
a
Change
Using
RDBMS
13. 13
Making
the
Same
Change
with
a
Document
Database
{
“ID”:
1,
“FIRST”:
“Dip&”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”,
“STATUS”:
{
“TEXT”:
“At
Conf”
}
}
“GEO_LOC”:
“134”
},
“COUNTRY”:
”USA”
Just
add
informa&on
to
a
document
JSON
,
}
14. 14
When
considering
how
to
model
data
for
a
given
applica&on
• Think
of
a
logical
container
for
the
data
• Think
of
how
data
groups
together
Document
modeling
Q
• Are
these
separate
object
in
the
model
layer?
• Are
these
objects
accessed
together?
• Do
you
need
updates
to
these
objects
to
be
atomic?
• Are
mul&ple
people
edi&ng
these
objects
concurrently?
15. 15
Document
Design
Op&ons
• One
document
that
contains
all
related
data
– Data
is
de-‐normalized
– Be]er
performance
and
scale
– Eliminate
client-‐side
joins
• Separate
documents
for
different
object
types
with
cross
references
– Data
duplica&on
is
reduced
– Objects
may
not
be
co-‐located
– Transac&ons
supported
only
on
a
document
boundary
– Most
document
databases
do
not
support
joins
16. 16
Document
ID
/
Key
selec&on
• Similar
to
primary
keys
in
rela&onal
databases
• Documents
are
sharded
based
on
the
document
ID
• ID
based
document
lookup
is
extremely
fast
• Usually
an
ID
can
only
appear
once
in
a
bucket
Op&ons
• UUIDs,
date-‐based
IDs,
numeric
IDs
• Hand-‐crajed
(human
readable)
• Matching
prefixes
(for
mul&ple
related
objects)
Q
•
Do
you
have
a
unique
way
of
referencing
objects?
•
Are
related
objects
stored
in
separate
documents?
17. 17
• User
profile
The
main
pointer
into
the
user
data
• Blog
entries
• Badge
sekngs,
like
a
twi]er
badge
• Blog
posts
Contains
the
blogs
themselves
• Blog
comments
• Comments
from
other
users
Example:
En&&es
for
a
Blog
BLOG
20. 20
• You
can
imagine
how
to
take
this
to
a
threaded
list
Threaded
Comments
Blog
First
comment
Reply
to
comment
More
Comments
List
List
Advantages
• Only
fetch
the
data
when
you
need
it
• For
example,
rendering
part
of
a
web
page
• Spread
the
data
and
load
across
the
en&re
cluster
22. 22
RDBMS
Scales
Up
Get
a
bigger,
more
complex
server
Users
Applica&on
Scales
Out
Just
add
more
commodity
web
servers
Users
System
Cost
Applica&on
Performance
Rela&onal
Technology
Scales
Up
Rela&onal
Database
Web/App
Server
Tier
Expensive
and
disrup&ve
sharding,
doesn’t
perform
at
web
scale
System
Cost
Applica&on
Performance
Won’t
scale
beyond
this
point
23. 23
Couchbase
Server
Scales
Out
Like
App
Tier
NoSQL
Database
Scales
Out
Cost
and
performance
mirrors
app
&er
Users
Scaling
out
flatens
the
cost
and
performance
curves
Couchbase
Distributed
Data
Store
Web/App
Server
Tier
Applica&on
Scales
Out
Just
add
more
commodity
web
servers
Users
System
Cost
Applica&on
Performance
Applica&on
Performance
System
Cost
25. 25
The
Process
–
From
Evalua&on
to
Go
Live
Analyze
your
requirements
Find
solu&ons
/
products
that
match
key
requirements
Execute
a
proof
of
concept
/
performance
evalua&on
Begin
development
of
applica&on
Deploy
in
staging
and
then
produc&on
1
2
3
4
5
No
different
from
evalua&ng
a
rela&onal
database
New
requirements
è
New
solu&ons
26. 26
Analyze
your
requirements
• Rapid
applica&on
development
– Changing
market
needs
– Changing
data
needs
• Scalability
– Unknown
user
demand
– Constantly
growing
throughput
• Consistent
Performance
– Low
response
&me
for
be]er
user
experience
– High
throughput
to
handle
viral
growth
• Reliability
– Always
online
1
Common
applica&on
requirements
27. 27
Find
solu&ons
that
match
key
requirements
• Linear
Scalability
• Schema
flexibility
• High
Performance
2
NoSQL
RDBMS
RDBMS
NoSQL
• Mul&-‐document
transac&ons
• Database
Rollback
• Complex
security
needs
• Complex
joins
• Extreme
compression
needs
• Both
/
depends
on
the
data
28. 28
Proof
of
concept
/
Performance
evalua&on
3
Prototype
a
workload
• Look
for
consistent
performance…
– Low
response
&mes
/
latency
• For
be]er
user
experience
– High
throughput
• To
handle
viral
growth
• For
resource
efficiency
• …
across
– Read
heavy
/
Write
heavy
/
Mixed
workloads
– Clusters
of
growing
sizes
• …
and
watch
for
– Conten&on
/
heavy
locking
– Linear
scalability
29. 29
Other
considera&ons
Accessing
data
– No
standards
exist
yet
– Typically
via
SDKs
or
over
HTTP
– Check
if
the
programing
language
of
your
choice
is
supported.
App
Server
App
Server
App
Server
3
Consistency
– Consistent
only
at
the
document
level
– Most
documents
stores
currently
don’t
support
mul&-‐document
transac&ons
– Analyze
your
applica&on
needs
Availability
– Each
node
stores
ac&ve
and
replica
data
(Couchbase)
– Each
node
is
either
a
master
or
slave
(MongoDB)
30. 30
Opera&ons
– Monitoring
the
system
– Backup
and
restore
the
system
– Upgrades
and
maintenance
– Support
App
Server
App
Server
Client
Other
considera&ons
3
Ease
of
Scaling
– Ease
of
adding
and
reducing
capacity
– Single
node
type
– App
availability
on
topology
changes
Indexing
and
Querying
– Secondary
indexes
(Map
func&ons)
– Aggregates
Grouping
(Reduce
func&ons)
– Basic
querying
31. 31
Begin
development
4
Data
Modeling
and
Document
Design
32. 32
Deploying
to
staging
and
produc&on
5
• Monitoring
the
system
• RESTful
interfaces
/
Easy
integra&on
with
monitoring
tools
• High-‐availability
• Replica&on
• Failover
and
Auto-‐failover
• Always
Online
–
even
for
maintenance
tasks
• Database
upgrades
• Sojware
(OS)
and
Hardware
upgrades
• Backup
and
restore
• Index
building
• Compac&on
35. 35
Q
Q
So
are
you
being
impacted
by
these?
Schema
Rigidity
problems
• Do
you
store
serialized
objects
in
the
database?
• Do
you
have
lots
of
sparse
tables
with
very
few
columns
being
used
by
most
rows?
• Do
you
find
that
your
applica&on
developers
require
schema
changes
frequently
due
to
constantly
changing
data?
• Are
you
using
your
database
as
a
key-‐value
store?
Scalability
problems
• Do
you
periodically
need
to
upgrade
systems
to
more
powerful
servers
and
scale
up?
• Are
you
reaching
the
read
/
write
throughput
limit
of
a
single
database
server?
• Is
your
server’s
read
/
write
latency
not
mee&ng
your
SLA?
• Is
your
user
base
growing
at
a
frightening
pace?
36. 36
Is
NoSQL
the
right
choice
for
you?
Does
your
applica&on
need
rich
database
func&onality?
• Mul&-‐document
transac&ons
• Complex
security
needs
–
user
roles,
document
level
security,
authen&ca&on,
authoriza&on
integra&on
• Complex
joins
across
bucket
/
collec&ons
• BI
integra&on
• Extreme
compression
needs
NoSQL
may
not
be
the
right
choice
for
your
applica&on
38. 38
Performance
driven
use
cases
• Low
latency
• High
throughput
ma]ers
• Large
number
of
users
• Unknown
demand
with
sudden
growth
of
users/data
• Predominantly
direct
document
access
• Workloads
with
very
high
muta&on
rate
per
document
(temporal
locality)
Working
set
with
heavy
writes
39. 39
Data
driven
use
cases
• Support
for
unlimited
data
growth
• Data
with
non-‐homogenous
structure
• Need
to
quickly
and
ojen
change
data
structure
• 3rd
party
or
user
defined
structure
• Variable
length
documents
• Sparse
data
records
• Hierarchical
data
42. 42
Easy
Scalability
Consistent,
High
Performance
Always
On
24x7x365
Grow
cluster
without
applica&on
changes,
without
down&me
with
a
single
click
Consistent
sub-‐millisecond
read
and
write
response
&mes
with
consistent
high
throughput
No
down&me
for
sovware
upgrades,
hardware
maintenance,
etc.
Couchbase
Server
43. 43
Flexible
Data
Model
• No
need
to
worry
about
the
database
when
changing
your
applica&on
• Records
can
have
different
structures,
there
is
no
fixed
schema
• Allows
painless
data
model
changes
for
rapid
applica&on
development
{
“ID”:
1,
“FIRST”:
“Dip&”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”
}
JSON
JSON
JSON
JSON
45. 45
Couchbase
Server
2.0
Architecture
Heartbeat
Process
monitor
Global
singleton
supervisor
Configura&on
manager
on
each
node
Rebalance
orchestrator
Node
health
monitor
one
per
cluster
vBucket
state
and
replica&on
manager
htp
REST
management
API/Web
UI
HTTP
8091
Erlang
port
mapper
4369
Distributed
Erlang
21100
-‐
21199
Erlang/OTP
storage
interface
Couchbase
EP
Engine
11210
Memcapable
2.0
Moxi
11211
Memcapable
1.0
Memcached
New
Persistence
Layer
8092
Query
API
Query
Engine
Data
Manager
Cluster
Manager
46. 46
Couchbase
Server
2.0
Architecture
Heartbeat
Process
monitor
Global
singleton
supervisor
Configura&on
manager
on
each
node
Rebalance
orchestrator
Node
health
monitor
one
per
cluster
vBucket
state
and
replica&on
manager
htp
REST
management
API/Web
UI
HTTP
8091
Erlang
port
mapper
4369
Distributed
Erlang
21100
-‐
21199
Erlang/OTP
storage
interface
Couchbase
EP
Engine
11210
Memcapable
2.0
Moxi
11211
Memcapable
1.0
Memcached
New
Persistence
Layer
8092
Query
API
Query
Engine
47. 47
Couchbase
deployment
Data
Flow
Cluster
Management
Web
Applica&on
Couchbase
Client
Library
48. 48
3
3
2
Single
node
-‐
Couchbase
Write
Opera&on
2
Managed
Cache
Disk
Queue
Disk
Replica&on
Queue
App
Server
Couchbase
Server
Node
Doc
1
Doc
1
Doc
1
To
other
node
49. 49
3
3
2
Single
node
-‐
Couchbase
Update
Opera&on
2
Managed
Cache
Disk
Queue
Replica&on
Queue
App
Server
Couchbase
Server
Node
Doc
1’
Doc
1
Doc
1’
Doc
1
Doc
1’
Disk
To
other
node
50. 50
GET
Doc
1
3
3
2
Single
node
-‐
Couchbase
Read
Opera&on
2
Disk
Queue
Replica&on
Queue
App
Server
Couchbase
Server
Node
Doc
1
Doc
1
Doc
1
Managed
Cache
Disk
To
other
node
51. 51
3
3
2
Single
node
-‐
Couchbase
Cache
Evic&on
2
Disk
Queue
Replica&on
Queue
App
Server
Couchbase
Server
Node
Doc
1
Doc
6
Doc
5
Doc
4
Doc
3
Doc
2
Doc
1
Doc
6
Doc
5
Doc
4
Doc
3
Doc
2
Managed
Cache
Disk
To
other
node
52. 52
3
3
2
Single
node
–
Couchbase
Cache
Miss
2
Disk
Queue
Replica&on
Queue
App
Server
Couchbase
Server
Node
Doc
1
Doc
3
Doc
5
Doc
2
Doc
4
Doc
6
Doc
5
Doc
4
Doc
3
Doc
2
Doc
4
GET
Doc
1
Doc
1
Doc
1
Managed
Cache
Disk
To
other
node
53. 53
COUCHBASE
SERVER
CLUSTER
Cluster
wide
-‐
Basic
Opera&on
• Docs
distributed
evenly
across
servers
• Each
server
stores
both
ac&ve
and
replica
docs
Only
one
server
ac&ve
at
a
&me
• Client
library
provides
app
with
simple
interface
to
database
• Cluster
map
provides
map
to
which
server
doc
is
on
App
never
needs
to
know
• App
reads,
writes,
updates
docs
• Mul&ple
app
servers
can
access
same
document
at
same
&me
User
Configured
Replica
Count
=
1
READ/WRITE/UPDATE
ACTIVE
Doc
5
Doc
2
Doc
Doc
Doc
SERVER
1
ACTIVE
Doc
4
Doc
7
Doc
Doc
Doc
SERVER
2
Doc
8
ACTIVE
Doc
1
Doc
2
Doc
Doc
Doc
REPLICA
Doc
4
Doc
1
Doc
8
Doc
Doc
Doc
REPLICA
Doc
6
Doc
3
Doc
2
Doc
Doc
Doc
REPLICA
Doc
7
Doc
9
Doc
5
Doc
Doc
Doc
SERVER
3
Doc
6
APP
SERVER
1
COUCHBASE
Client
Library
CLUSTER
MAP
COUCHBASE
Client
Library
CLUSTER
MAP
APP
SERVER
2
Doc
9
54. 54
Cluster
wide
-‐
Add
Nodes
to
Cluster
• Two
servers
added
One-‐click
opera&on
• Docs
automa&cally
rebalanced
across
cluster
Even
distribu&on
of
docs
Minimum
doc
movement
• Cluster
map
updated
• App
database
calls
now
distributed
over
larger
number
of
servers
REPLICA
ACTIVE
Doc
5
Doc
2
Doc
Doc
Doc
4
Doc
1
Doc
Doc
SERVER
1
REPLICA
ACTIVE
Doc
4
Doc
7
Doc
Doc
Doc
6
Doc
3
Doc
Doc
SERVER
2
REPLICA
ACTIVE
Doc
1
Doc
2
Doc
Doc
Doc
7
Doc
9
Doc
Doc
SERVER
3
SERVER
4
SERVER
5
REPLICA
ACTIVE
REPLICA
ACTIVE
Doc
Doc
8
Doc
Doc
9
Doc
Doc
2
Doc
Doc
8
Doc
Doc
5
Doc
Doc
6
READ/WRITE/UPDATE
READ/WRITE/UPDATE
APP
SERVER
1
COUCHBASE
Client
Library
CLUSTER
MAP
COUCHBASE
Client
Library
CLUSTER
MAP
APP
SERVER
2
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
55. 55
Cluster
wide
-‐
Fail
Over
Node
REPLICA
ACTIVE
Doc
5
Doc
2
Doc
Doc
Doc
4
Doc
1
Doc
Doc
SERVER
1
REPLICA
ACTIVE
Doc
4
Doc
7
Doc
Doc
Doc
6
Doc
3
Doc
Doc
SERVER
2
REPLICA
ACTIVE
Doc
1
Doc
2
Doc
Doc
Doc
7
Doc
9
Doc
Doc
SERVER
3
SERVER
4
SERVER
5
REPLICA
ACTIVE
REPLICA
ACTIVE
Doc
9
Doc
8
Doc
Doc
6
Doc
Doc
Doc
5
Doc
Doc
2
Doc
8
Doc
Doc
• App
servers
accessing
docs
• Requests
to
Server
3
fail
• Cluster
detects
server
failed
Promotes
replicas
of
docs
to
ac&ve
Updates
cluster
map
• Requests
for
docs
now
go
to
appropriate
server
• Typically
rebalance
would
follow
Doc
Doc
1
Doc
3
APP
SERVER
1
COUCHBASE
Client
Library
CLUSTER
MAP
COUCHBASE
Client
Library
CLUSTER
MAP
APP
SERVER
2
User
Configured
Replica
Count
=
1
COUCHBASE
SERVER
CLUSTER
56. 56
COUCHBASE
SERVER
CLUSTER
Indexing
and
Querying
User
Configured
Replica
Count
=
1
ACTIVE
Doc
5
Doc
2
Doc
Doc
Doc
SERVER
1
REPLICA
Doc
4
Doc
1
Doc
8
Doc
Doc
Doc
APP
SERVER
1
COUCHBASE
Client
Library
CLUSTER
MAP
COUCHBASE
Client
Library
CLUSTER
MAP
APP
SERVER
2
Doc
9
• Indexing
work
is
distributed
amongst
nodes
• Large
data
set
possible
• Parallelize
the
effort
• Each
node
has
index
for
data
stored
on
it
• Queries
combine
the
results
from
required
nodes
ACTIVE
Doc
5
Doc
2
Doc
Doc
Doc
SERVER
2
REPLICA
Doc
4
Doc
1
Doc
8
Doc
Doc
Doc
Doc
9
ACTIVE
Doc
5
Doc
2
Doc
Doc
Doc
SERVER
3
REPLICA
Doc
4
Doc
1
Doc
8
Doc
Doc
Doc
Doc
9
Query
57. 57
Cross
Data
Center
Replica&on
(XDCR)
COUCHBASE
SERVER
CLUSTER
NY
DATA
CENTER
ACTIVE
Doc
Doc
2
SERVER
1
Doc
9
SERVER
2
SERVER
3
RAM
Doc
Doc
Doc
ACTIVE
Doc
Doc
Doc
RAM
ACTIVE
Doc
Doc
Doc
RAM
DISK
Doc
Doc
Doc
DISK
Doc
Doc
Doc
DISK
COUCHBASE
SERVER
CLUSTER
SF
DATA
CENTER
ACTIVE
Doc
Doc
2
SERVER
1
Doc
9
SERVER
2
SERVER
3
RAM
Doc
Doc
Doc
ACTIVE
Doc
Doc
Doc
RAM
ACTIVE
Doc
Doc
Doc
RAM
DISK
Doc
Doc
Doc
DISK
Doc
Doc
Doc
DISK