presentation for KVIV on 2010/06/03
as usual with my presentations: if you we're not there, you missed half of the fun as some of the more important ideas are not in the slides
KVIV / NoSQL : the new generation of database servers
1. NoSQL
de nieuwe generatie van database servers
KVIV IT - 3/6/2010
http://www.flickr.com/photos/wolfgangstaudt/2215246206/
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
2. Who am I
» Steven Noels - stevenn@outerthought.org
» Outerthought : scalable content applications
» makers of Daisy and Lily open source CMS
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
3. An evolution
driven by pain.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
12. Scaling: horizontal partitioning
c
o
m
p
l
e
x
i
t
y
databases app servers users
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
13. Scaling through architecture
data layer app layer users
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
15. 8 fallacies of distributed computing
» The network is reliable.
» Latency is zero.
» Bandwidth is infinite.
Peter Deutsch and James Gosling
» The network is secure.
» Topology doesn't change.
» There is one administrator.
» Transport cost is zero.
» The network is homogeneous.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
16. Scaling relational systems
» When scaling relational systems you loose
their advantages but retain their overhead
» The pain is all about locking (i.e. writes)
» Caching alleviates the read pain to the cost of
complexity
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
17. The Perspective of Cost
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
28. The NOSQL footprint
free-structured or sparse data
NOSQL
MongoDB
CouchDB
neo4j
Cassandra
available (complexity)
simple operational
HBase
highly scalable and
constraints
ACID,
SQL
referential integrity,
typed data
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
29. NOSQL, if you need ...
» horizontal scaling (out rather than up)
» unusually common data (aka free-structured)
» speed (especially for writes)
» the bleeding edge
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 29
30. SQL/RDBMS, if you need ...
» SQL
» ACID
» normalisation
» a defined liability
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30
35. Google BigTable
» multi-dimensional column-oriented database
» on top of GoogleFileSystem
» object versioning
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
36. CAP theorem
strong high
consistency availability
partition-
tolerance
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36
37. CAP
» Strong Consistency: all clients see the
same view, even in the presence of updates
» High Availability: all clients can find some
replica of the data, even in the presence of
failures
» Partition-tolerance: the system
properties hold even when the system is
partitioned
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37
38. Culture Clash
» ACID » BASE
» highest priority: strong » availability and scaling
consistency for highest priorities
transactions » weak consistency
» availability less important
» optimistic
» pessimistic
» best effort
» rigorous analysis
» simple and fast
» complex mechanisms
spectrum
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
46. Hadoop ecosystem
» Hadoop Common
» Subprojects
» Chukwa: A data collection system for managing large distributed systems.
» HBase: A scalable, distributed database that supports structured data storage for
large tables.
» HDFS: A distributed file system that provides high throughput access to application
data.
» Hive: A data warehouse infrastructure that provides data summarization and ad hoc
querying.
» MapReduce: A software framework for distributed processing of large data sets on
compute clusters.
» Pig: A high-level data-flow language and execution framework for parallel
computation.
» ZooKeeper: A high-performance coordination service for distributed applications.
» Mahout: machine learning libaries
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 46
47. Processing large datasets with MR
» Benefit from parallellisation
» Less modelling upfront (ad-hoc processing)
» Compartmentalized approach reduces
operational risks
» AsterData et al. have SQL/MR hybrids for
huge-scale BI
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 47
50. Trend 1: Data size
ExaBytes (10!") of data stored per year
988
1000
Each year more and
more digital data is
created. Over t wo
750 years we create more
digital data than all 623
the data created in
history before that.
500
397
253
250 161
0
2006 2007 2008 2009 2010
Data source: IDC 2007 3
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 50
51. Trend 2: Connectedness
Giant
Global
Graph (GGG)
Over time data has evolved to Ontologies
be more and more interlinked
and connected.
RDF
Hypertext has links,
Blogs have pingback,
Tagging groups all related data Folksonomies
Information connectivity
Tagging
Wikis User-generated
content
Blogs
RSS
Hypertext
Text documents
web 1.0 web 2.0 “web 3.0”
1990 2000 2010 2020 4
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 51
52. Trend 3: Semi-structure
! Individualization of content
• In the salary lists of the 1970s, all elements had exactly one job
• In Or 15? lists of the 2000s, we need 5 job columns! Or 8?
the salary
! All encompassing “entire world views”
• Store more data about each entity
! Trend accelerated by the decentralization of content generation
that is the hallmark of the age of participation (“web 2.0”)
5
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 52
59. Key-value stores
» Focus on scaling to huge amounts of data
» Designed to handle big loads
» Often: cfr. Amazon Dynamo
» ring partitioning and replication
» Data model: key/value pairs
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 59
75. Cassandra
» Key-value store (with added structure)
» Reliability (identical nodes)
» Eventual consistent
» Distributed
A
C
» Tunable
» Partitioning
P
» Replication
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 75
76. Cassandra applicability
FIT NO FIT
» Scalable reliability » Flexible indexing
(through identical » Only PK-based
nodes) querying
» Linear scaling » Big Binary Data
» Write throughput » 1 Row must fit in
» Large Data Sets RAM entirely
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 76
79. Document databases
» ≈ K/V stores, but DB knows what the Value is
» Lotus Notes heritage
» Data model: collections of K/V collections
» Documents often versioned
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 79
88. MongoDB
» cfr. CouchDB, really
» except for:
» C++
» performance focus
» runtime queries (mapreduce still available)
» native drivers (no REST/HTTP layering)
» no MVCC: update-in-place
» auto sharding (alpha)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 88
89. Graph databases
» Focus on modeling structure of data -
interconnectivity
» Scale, but only to the complexity of data
» Data model: property graphs
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 89
99. NOSQL applicability
» Horizontal scaling
» Multi-Master
» Data representation
» search of simplicity
» data that doesn’t fit the E-R model
(graphs, trees, versions)
» Speed
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 99
100. Tool selection
» be careful on the marketeese:
smoke and mirrors beware!
» monitor dev list, IRC, Twitter, blogs
» monitor project ‘sponsors’
» mix-and-match: polyglot persistency
» DON’T NOSQL WITHOUT INTERNAL SYS
ARCHS & DEV(OP)S !
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 100
101. Our Context: Lily
» cloud-scalable content store and search
repository
» successor (in many ways) of Daisy
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 101
103. Lily essentials
» (open source)
» scalable store (Apache HBase)
» and search (Apache SOLR)
» content repository
» α due mid 2010
» www.lilycms.org or @outerthought
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 103
104. Choosing a NoSQL store for Lily: step I
» automatic scaling to large data sets
» fault-tolerance
» flexible datamodel with sparse data
» commodity hardware
» efficient random access
» community-based open source
» Java if possible
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 104
105. Choosing a NoSQL store for Lily: step II
» need for consistency
» atomic single-row updates
» M/R for index regeneration
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 105
106. Choosing a NoSQL store for Lily: step III
HBase
» datamodel with column families and cell
versioning
» ordered tables with range scans
» HDFS for blob storage
» Apache
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 106
107. distributed process coordination
and configuration (ZooKeeper)
}
query update indexer
Lily
Lily Lily Store Server
store
client
node WAL MQ M/R
client
}
store
node 2ary WAL / HBase Region Server
documents
indexes MQ
client
store
node
} Hadoop DFS
REST
index
replica
inverted index
replica replica
} SOLR
lily simplified architecture
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 107
111. When combining store
and search, make sure
your (search) index
doesn’t become the
store.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
112. Key lessons learned
» unlearning normalization is very difficult
» integrity checking in code = not so bad
» doing joins in code can be very liberating
» importance of keyspace design
» secondary indexing
» data de-normalization = size! (x3)
» schema vs. code flexibility?
» distribution is everywhere
and you shouldn’t forget about it
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 112
113. Mis-use cases
» SQL (or ORM) is a prerequisite
» Deeply hierarchical datasets (unless graph)
» Data integrity is listed on DBA job description
» High-security apps (enforced in DB)
» Transactional data (banking)
» Usage is highly unpredictable, combinatorial, or
likely to change suddenly
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 113
114. Reading material
» Amazon Dynamo, Google BigTable, CAP
» http://nosql.mypopescu.com/
» http://nosql-database.org/
» http://twitter.com/nosqlupdate
» http://highscalability.com/
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 114