6. What is NoSQL
●
●
Hardest question we will have to answer today
Simplest definition is that NoSQL databases are
a set (not even a family) of mechanisms for
storage and retrieval of data that try to be:
–
highly available
–
able to scale horizontally
7. Simple, yet true?
●
●
Simple answers may be simple, though they
are not necessarily correct
Especially with NoSQL because:
10. In particular
●
NoSQL are data stores that are NOT relational
databases
●
They are something else
●
Thus it follows that they are not relational
●
And some say they are not even true
databases
11. NoSQL = Without SQL?
●
●
●
Perhaps a datastore without a SQL dialect of its
own?
Well no – some NoSQL solutions do have SQL
or SQL- ish dialects
So NoSQL is not even NO to SQL
12. Not Only SQL
●
●
Which is fair enough, but a subtle point
Not in terms of black and white, cool – not cool,
works – sucks binary viewpoints
●
And then it should be NOSQL, but it is NoSQL
●
But this still does not explain things
13. How did the term NoSQL caught on
then?
●
●
●
●
You should know the answer – many people
hate SQL
And SQL is easy to hate!
It is large, Large, LARGE, LARGE – Oracle
SQL Language reference for 12.1 is 1826
pages.
Have you actually seen the whole rail-road
diagram for Oracle's SELECT anywhere?
14. SQL – Love or Hate?
●
●
●
It is not a single language, rather it is a family of
dialects by competing companies
Have you seen the standard (9 parts, 10 th under
way, more than 4000 pages)?
Do you even care about the standard?
15. As Academic As You Can Get
●
SQL is not even relational
●
It is a language of bags, rather than sets
16. If you can make the last point –
you do not hate SQL, you have
grown used to it
17. What else NoSQL is not?
●
Martin Folwer says
NoSQL should be
called NoDBA
because developers
use NoSQL to run
around traditional
databases with their
DBAs and
bureaucracies.
19. The term NoSQL
●
●
●
Was coined in 1998
by Carlo Strozzi
Lightweight relational
database that lacks
SQL dialect – NoSQL
NoSQL should be
NoREL
20. The current usage of the term is a
#tag
●
●
●
Started in 2009 when
Eric Evans who worked
at Rackspace
He proposed NoSQL as
a Twitter #tag for a
conference for the
existing distributed
databases
The term stayed and
gained popularity
21. NoSQL stems from needs that are
●
Hard
●
Impossible
●
Or even worse – prohibitively expensive to fulfil
with a traditional relational databases
22. Examples
●
●
●
●
Not-structured data or hard to model in a relational
way
Big data - generated by interuser interaction
(Facebook), imported from external sources (WWW)
Bringing structure to otherwise unstructured data –
what we usually model as LOBs or BLOBs in
RDBMS
Graphs
–
Hierarchies in RDBMS (even bi-directional). Storing
vertices and edges in a table and then modeling paths
with joins – like the Entity Attribute Value anti-pattern
26. Origins of Scale
●
●
●
Towards Robust
Distributed Systems –
Symposium on
Principles of Distributed
Computing - 2000
Eric Brewer then at
Inktomi
Called his conjecture –
the CAP theorem
28. The Fall Of the Triad
2 Out of 3
`
Consistency Availability
Tolerance to
network
partitions
29. The Fall Of the Triad
2 Out of 3
`
Consistency Availability
Tolerance to
network
partitions
You cannot
have full
availability –
all operations
can proceed,
even writes
30. The Fall Of the Triad
2 Out of 3
While keeping
consistency all nodes have
the same data
(not the same
as C in ACID)
`
Consistency Availability
Tolerance to
network
partitions
You cannot
have full
availability –
all operations
can proceed,
even writes
31. The Fall Of the Triad
2 Out of 3
While keeping
consistency all nodes have
the same data
(not the same
as C in ACID)
`
Consistency Availability
Tolerance to
network
partitions
You cannot
have full
availability –
all operations
can proceed,
even writes
When you have
partitions –
machines that
cannot
communicate
32. A Somewhat Better Representation
100%
consistency
Impossible
to achieve
100%
availability
100% partition
tolerance
33. A Somewhat Better Representation
100%
consistency
Impossible
to achieve
The Whole
Volume Is
Interesting
100%
availability
100% partition
tolerance
35. A Single System Can Wander In
The Space
100%
consistency
Impossible
to achieve
100%
availability
100% partition
tolerance
36. Or Have Data Operations In
Different Points At The Same Time
100%
consistency
Impossible
to achieve
100%
availability
100% partition
tolerance
37. CAP is easy to prove
●
●
●
●
Think of two nodes on opposite sides of a partition
Allowing at least one node to update state will cause
the nodes to become inconsistent, thus forfeiting C.
If we preserve consistency, one side of the partition
must act as if it is unavailable, thus forfeiting A.
Only when nodes communicate is it possible to
preserve both consistency and availability, thereby
forfeiting P.
38. ACID vs. BASE
●
●
Brewer called these
BASE: Basically
Available, Soft state,
Eventually consistent
to pun the pun of
Jim Gray
But NoSQL caught on
41. A Typical NoSQL Taxonomy
●
Key-value stores
●
Document databases
●
Column family stores
42. A Typical NoSQL Taxonomy
●
Key-value stores
●
Column family stores
●
Document databases
●
Graph databases
43. A Typical NoSQL Taxonomy
●
Key-value stores
●
Column family stores
●
Document databases
●
Graph databases
44. Not True Taxonomy
●
These are folk taxonomies
●
What happens to exist currently
●
No family relations – no speciation
●
Even putting them in four corners is visually
lying – some key-value stores are very close to
some document databases, while graph
databases look like the odd man out and stand
on their own
45. How do you use these NoSQLs?
●
Get one or few values out of the store
●
Either modify and store
●
Or go on looking for other values
●
It is like pointer chasing
●
p->p1->p2->p3...
46. The Other Way to Use Is MapReduce
●
●
●
Similar to the way we process garbage for
recycling
Make heaps of garbage, make many teams sort
each out (map)
Aggregate iron, plastics, paper, glass from each
team (reduce)
●
Very efficient batch processing
●
But it is batch processing
47. NoSQL are Less Capable than
RDBMS
●
●
●
Do not expect similar behavior or even
capabilities – even when you have seen so on
first glance in the documentation
The maturity of RDBMS ecosystem and your
expectations may be a bad service for the wild
west of NoSQL
Do not assume – double check
48. Less Sophisticated Than RDBMS
●
Less to learn
●
Less to administer
●
Easy to start using
●
●
Similar to pointer and reference programming
models
Programmers like them
49. Loved By Developers? What About
Admins And Operations?
●
Ad Hoc Data Fixing – how?
●
Ad Hoc Data Querying – how?
●
Data Export – how?
50. So will NoSQL beat SQL?
●
●
●
First – why do we ask this? Hype and
fanboyism, tradition and rut all have their answer
For some NoSQL has already beaten SQL (no
matter what NoSQL and SQL mean)
For others NoSQL is way too young and not
providing even a part of what SQL does
(similarly - no matter what NoSQL and SQL
mean)
51. NoSQL is changing, so does SQL
●
Champions of NoSQL like Google are moving
closer to SQL and RDBMS
–
–
Transactions help developers reason about what is
happening, developers also like it
–
No ACID in DB means it is maddengly hard to ACID
on application level
–
●
Declarativeness of SQL is fine and actually
developers like it
Speed is not everythg – it is just part of the equasion
Move from batch to online processing
52. O, champion of NoSQL – Where Art
Thou Now?
●
Google
–
–
●
Spanner – ACID, SQL, schematized tables,
PAXOS, descendant of Megastore rather than
BigTable
F1 – General transactions, Paxos, relational
schema + extensions hierarchy, rich data types,
Facebook
–
Presto – standard SQL, window functions, ad hoc
queries
54. SQL is so last millenium, there is
NewSQL
●
●
The variety in NoSQL and competition among
RDBMS are pushing traditional SQL engnes to
differentiate more strongly
No more – One size fits all
55. OLAP/DW
●
Moving to column stores rather than traditional row
oriented stores – 50-100 faster
–
–
Better compression
–
●
No row per header
IO much better for sparsely filled wide tables when
running aggregates on several columns
IBM DB2 (10.5, June 2013), Oracle (some in 11g2
2009 Exadata, more in 12c), MS SQL Server (some
in 2012/2014 CTP1, June 2013), SAP HANA, MySQL
56. Current OLTP
Buffer pool ≈ 24%
24
24
Locking ≈ 24%
4
24
Latching ≈ 24%
Recovery ≈ 24%
24
Useful work ≈ 4%
58. How to get this ideal OLTP?
●
●
●
●
Latching – due to multithreadness. Go single
threaded, each core – like a single thread, divide
memory or remove all shared data
Buffer pool – go into main memory, use anticaching
Row level locking – MVCC, timestamp ordering,
lightweight locking
Recovery – replication rather than rely on Aries,
replicate via command logging
–
Algorithms for Recovery and Isolation Exploiting Semantics
65. Back in 1979
●
●
As part of Unix he
also wrote DBM
(database manager)
Basically a hashtable
backed by disk
storage
66. To Put Tings Into Perspective
1970 1971 1970 1973 1970 1975 1970 1977 1970 1979 1970 1981
1972
1974
1976
1978
1980
67. To Put Tings Into Perspective
A Relational
Model of Data
for Large
Shared
Data Banks
June 1970
1970 1971 1970 1973 1970 1975 1970 1977 1970 1979 1970 1981
1972
1974
1976
1978
1980
68. To Put Tings Into Perspective
A Relational
Model of Data
for Large
Shared
Data Banks
June 1970
System R
first
research
prototype
1974
1970 1971 1970 1973 1970 1975 1970 1977 1970 1979 1970 1981
1972
1974
1976
1978
1980
69. To Put Tings Into Perspective
A Relational
Model of Data
for Large
Shared
Data Banks
June 1970
System R
first
research
prototype
1974
First IBM
commercial
product
SQL/DS - 1981.
1970 1971 1970 1973 1970 1975 1970 1977 1970 1979 1970 1981
1972
1974
1976
1978
1980
70. To Put Tings Into Perspective
A Relational
Model of Data
for Large
Shared
Data Banks
June 1970
System R
first
research
prototype
1974
First IBM
commercial
product
SQL/DS - 1981.
1970 1971 1970 1973 1970 1975 1970 1977 1970 1979 1970 1981
1972
1974
1976
1978
1980
Beat to the market by a
smaller firm RSI –
Relational Software. Inc.
Founded as
Software Development
Laboratories (SDL) 1977
by these guys
78. Unix Went To College – Berkeley
1986
1970 1987 1970 1989 1970 1991 1970 1993 1970 1995 1970 1997
1988
1990
1992
1994
1996
79. Unix Went To College – Berkeley
DBM became
ndbm –
new database
manager
1986
1970 1987 1970 1989 1970 1991 1970 1993 1970 1995 1970 1997
1988
1990
1992
1994
1996
80. Unix Went To College – Berkeley
DBM became
ndbm –
new database
manager
Lawsuit –
in 1992, ended
in 1994. Effort to
rewrite AT&T
copyrighted
utilities.
1986
1970 1987 1970 1989 1970 1991 1970 1993 1970 1995 1970 1997
1988
1990
1992
1994
1996
81. Unix Went To College – Berkeley
DBM became
ndbm –
new database
manager
Linus
Torvalds
Started
Linux
Lawsuit –
in 1992, ended
in 1994. Effort to
rewrite AT&T
copyrighted
utilities.
1986
1970 1987 1970 1989 1970 1991 1970 1993 1970 1995 1970 1997
1988
1990
1992
1994
1996
82. Unix Went To College – Berkeley
DBM became
ndbm –
new database
manager
Linus
Torvalds
Started
Linux
Lawsuit –
in 1992, ended
in 1994. Effort to
rewrite AT&T
copyrighted
utilities.
1986
1970 1987 1970 1989 1970 1991 1970 1993 1970 1995 1970 1997
1988
1990
1992
1994
1996
Keith
Bostic
Designed
the API
83. Unix Went To College – Berkeley
DBM became
ndbm –
new database
manager
Linus
Torvalds
Started
Linux
Lawsuit –
in 1992, ended
in 1994. Effort to
rewrite AT&T
copyrighted
utilities.
1986
1970 1987 1970 1989 1970 1991 1970 1993 1970 1995 1970 1997
1988
1990
1992
1994
1996
Keith
Bostic
Michael
Olson
Designed
the API
Btree
impl.
84. Unix Went To College – Berkeley
DBM became
ndbm –
new database
manager
Lawsuit –
in 1992, ended
in 1994. Effort to
rewrite AT&T
copyrighted
utilities.
Linus
Torvalds
Started
Linux
1986
1970 1987 1970 1989 1970 1991 1970 1993 1970 1995 1970 1997
1988
1990
1992
1994
1996
Keith
Bostic
Michael
Olson
Designed
the API
Btree
impl.
Db 1.85 part of
4.4 BSD
85. Unix Went To College – Berkeley
DBM became
ndbm –
new database
manager
Lawsuit –
in 1992, ended
in 1994. Effort to
rewrite AT&T
copyrighted
utilities.
Linus
Torvalds
Started
Linux
1986
1970 1987 1970 1989 1970 1991 1970 1993 1970 1995 1970 1997
1988
1990
1992
1994
1996
Keith
Bostic
Michael
Olson
Designed
the API
Btree
impl.
Db 1.85 part of
4.4 BSD
Margo
Seltzer
Paper
on TX
variant
90. Berkeley DB by Sleepycat Software
Sleepycat
Software
BerkleyDB
2.0 –
transactions
BerkleyDB
3.0 – API
BerkleyDB
4.0 –
HA single
master,
multiple
reader
1996
1970 1997 1970 1999 1970 2001 1970 2003 1970 2005 1970 2007
1998
2000
2002
2004
2006
91. Berkeley DB by Sleepycat Software
Sleepycat
Software
BerkleyDB
2.0 –
transactions
BerkleyDB
3.0 – API
BerkleyDB
4.0 –
HA single
master,
multiple
reader
1996
1970 1997 1970 1999 1970 2001 1970 2003 1970 2005 1970 2007
1998
2000
2002
2004
2006
BerkleyDB
Java Edition
pure Java
impl.
92. Berkeley DB by Oracle
Sleepycat
Software
BerkleyDB
2.0–
transactions
BerkleyDB
3.0 – API
BerkleyDB
4.0 –
HA single
master,
multiple
reader
1996
1970 1997 1970 1999 1970 2001 1970 2003 1970 2005 1970 2007
1998
2000
2002
2004
2006
BerkleyDB
Java Edition
pure Java
impl.
Oracle
bought
Sleepycat –
embedded DB
93. And Then Everybody And Their Dog
Were Creating Databases
Google
FileSystem
2003
Google
MapReduce
2004
Google
BigTable
2006
Apache
CouchDB
2005
Apache
Hadoop
2007
Amazon
Dynamo
2007
Facebook
Cassandra
2008
Yahoo!
PNUTS
2008
Basho
Riak
2009
10gen
MongoDB
2009
VMWare
Redis
2009
LinkedIn
Voldemort
2009
Twitter
FlockDB
2010
94. And Then Everybody And Their Dog
Were Creating Databases
Google
FileSystem
2003
Google
MapReduce
2004
Google
BigTable
2006
Apache
CouchDB
2005
Apache
Hadoop
2007
r
pe
11 epape QL hy net
20 Facebook Yahoo! oS Basho er 10gen
Amazon
ay
hit PNUTS
M
Dynamo
Cassandra
Riak Int
MongoDB
W th2008N
e 2008 g e ly in
2007 acl
2009
2009
r
O
kin blTwitter n sites
nLinkedIna e o t
VMWare u
eb Voldemort FlockDBen
D
ail torr
Redis
av
2009
w 2009 and 2010
No es
ch
ca
95. Vive La Révolution!
●
●
●
●
Just 4 months later on Oracle OpenWorld in start
of October 2011 Oracle announced they were
working on a NoSQL solution, availability – end of
October 2011
December 2011 – version 1.2.x
December 2012 – Oracle NoSQL Database 2.0,
11gR2 (11.2.x)
The Old Dog Learns The New Tricks – VERY,
VERY FAST
111. But Let's Do It the Java 7 Way
public void getHandleJava7() {
try (KVStore store1 = KVStoreFactory.getStore(config)) {
// MEAT GOES HERE
} catch (Exception e) {
} finally {
}
}
112. How to Write
public void writeKeyValue() throws
UnsupportedEncodingException {
List<String> major = new ArrayList<>();
major.add("Muffin");
major.add("Man");
List<String> minor = new ArrayList<>();
minor.add("address");
Key k = Key.createKey(major, minor);
String address = "Drury Lane";
Value v = Value.createValue(address.getBytes("UTF-8"));
store.put(k, v);
store.putIfAbsent(k, v);
store.putIfPresent(k, v);
store.putIfVersion(k, v, null);
}
113. How to Delete
public void deleteKeyValue(){
List<String> major = Arrays.asList("Muffin", "Man");
List<String> minor = Arrays.asList("address");
Key k = Key.createKey(major, minor);
store.delete(k);
store.multiDelete(Key.createKey(major), null, null);
}
114. How to Read – 1
public void readARecord() throws
UnsupportedEncodingException{
List<String> major = Arrays.asList("Muffin", "Man");
List<String> minor = Arrays.asList("address");
Key k = Key.createKey(major, minor);
ValueVersion vv = store.get(k);
Value v = vv.getValue();
String result = new String(v.getValue(), "UTF-8");
result.equals("Drury Lane");
}
115. How to Read – 2
public void readFullMajor1Go(){
List<String> major = Arrays.asList("Muffin", "Man");
Key k = Key.createKey(major);
// Single operation
SortedMap<Key,ValueVersion> records =
store.multiGet(k, null, null);
for (Map.Entry<Key, ValueVersion> entry :
records.entrySet()) {
Key key = entry.getKey();
List<String> minor = key.getMinorPath();
ValueVersion vv = entry.getValue();
Value v = vv.getValue();
// Do some work with the Value here
}
}
116. How to Read – 3
public void readFullMajorManyGoes(){
List<String> major = Arrays.asList("Muffin", "Man");
Key k = Key.createKey(major);
// Non atomic
Iterator<KeyValueVersion> it =
store.multiGetIterator(
Direction.FORWARD, // BACKWARD, UNORDEREDED
0,
// Batch size, 0 - use default
k,
// the key
null, // KeyRange
null); // Depth - CHILDREN_ONLY, PARENT_AND_CHILDREN,
// DESCENDANTS_ONLY, PARENT_AND_DESCENDANTS
while (it.hasNext()){
Value v = it.next().getValue();
// Do some work with the Value here
}
}
117. How to Read – 4
public void readPartialMatch() {
List<String> major = Arrays.asList("Muffin");
Key k = Key.createKey(major);
// Non atomic, read large part of DB
Iterator<KeyValueVersion> it =
store.storeIterator(
Direction.UNORDERED, // BACKWARD, FORWARD
0,
// Batch size, 0 - use default
k,
// the key
null, // KeyRange
null); // Depth - CHILDREN_ONLY, PARENT_AND_CHILDREN,
// DESCENDANTS_ONLY, PARENT_AND_DESCENDANTS
while (it.hasNext()){
Value v = it.next().getValue();
// Do some work with the Value here
}
}
118. Key ranges
public void prepareKeyRange() {
// Bowerick Wowbagger the Infinitely Prolonged
// Hitchhikers Guide To the Galaxy
// Arthur Philip Dent - You are a jerk
KeyRange kr = new KeyRange(
"Arthur Philip Dent",
// start
true,
// inclusive? [(
"A-Rth-Urp-Hil-Ipdenu", // slug
true);
// inclusive? )]
}
119. Sequence of Operations - TX
public void sequence(){
OperationFactory of = store.getOperationFactory();
List<Operation> ops = new ArrayList<>();
Key k = null; Value v = null;
ops.add(of.createDelete(k));
ops.add(of.createPut(k, v));
// of.createDeleteIfVersion(); of.createPutIfAbsent();
// of.createPutIfPresent(); of.createPutIfVersion()
try { store.execute(ops);
} catch (OperationExecutionException | // cannot exec
DurabilityException | // durability not met
IllegalArgumentException | // list is ∅, null
RequestTimeoutException e) { // timeout
} catch (FaultException e) {
// sth else
}
}
123. Generic Avro
public void genericAvro(){
AvroCatalog catalog = store.getAvroCatalog();
GenericAvroBinding binding =
catalog.getGenericMultiBinding(schemas);
GenericRecord dev =
new GenericData.Record(developerSchema);
dev.put("name", "Sam A. Hacker");
dev.put("age", 37);
dev.put("language", "Java");
Key k = null; //Key.createKey
store.put(k, binding.toValue(dev));
Value v = store.get(k).getValue();
GenericRecord dbAdmin = binding.toObject(v);
dbAdmin.get("name");
}
124. Specific Avro
public void specificAvro(){
AvroCatalog catalog = store.getAvroCatalog();
SpecificAvroBinding binding =
catalog.getSpecificMultiBinding();
// generate via provided ant task
// org.apache.avro.compiler.specific.SchemaTask
Developer dev = new Developer();
dev.setName("Sam. A. Hacker");
dev.setAge(37);
dev.setLanguage("Java");
Key k = null; //Key.createKey
store.put(k, binding.toValue(dev));
Value v = store.get(k).getValue();
SpecificRecord sr = binding.toObject(v);
if (sr.getSchema().getFullName().equals("dba")){
DbAdmin dbAdmin = (DbAdmin) sr;
}
}
125. JSON Avro
public void jsonAvro(){
AvroCatalog catalog = store.getAvroCatalog();
JsonAvroBinding binding =
catalog.getJsonMultiBinding(schemas);
String jsonText =
"{"name": "Sam. A. Hacker"," +
" "age": 34, "language": "Java"}";
ObjectMapper jsonMapper = new ObjectMapper();
JsonNode json = jsonMapper.readTree(jsonText);
JsonRecord dev = new JsonRecord(json, developerSchema);
Key k = null; //Key.createKey
store.put(k, binding.toValue(dev));
Value v = store.get(k).getValue();
JsonRecord jr = binding.toObject(v);
if (jr.getSchema().getFullName().equals("dba")){
JsonNode dbAdmin = jr.getJsonNode(); dbAdmin.get("db");
}
}
126. Licensing
●
Community Edition – FLOSS, AGPLv3
●
Enterprise edition:
–
SNMP, Oracle RDBMS compatibility, JMX
–
$40/user/year (min. 25), $2000/processor/year
–
RDBMS Standard Edition One ≤ NoSQL ≤ RDBMS
Standard Edition
127. Further General Resources
●
●
●
●
●
Martin Fowler: NoSQL Distilled to an hour
http://vimeo.com/66052102
Martin Fowler: NoSQL Distilled
http://martinfowler.com/nosql.html
Ilya Katsov: NoSQL Data Modelling Techniques
http://highlyscalable.wordpress.com/2012/03/01/nosql-dat
a-modeling-techniques/
Christof Strauch: NoSQL Databases
http://www.christof-strauch.de/nosqldbs.pdf
Michael Stonebreaker
http://slideshot.epfl.ch/play/suri_stonebraker
128. Further Resources on CAP
●
●
●
●
Eric Brewer: Towards Robust Distributed Systems
http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keyn
ote.pdf
Eric Brewer: NoSQL: Past, Present, Future
http://www.infoq.com/presentations/NoSQL-History
Eric Brewer: CAP Twelve Years Later: How the "Rules" Have
Changed
http://www.infoq.com/articles/cap-twelve-years-later-how-the
-rules-have-changed
Nancy Lynch, Seth Gilbert: Brewer’s Conjecture and the
Feasibility of Consistent, Available, Partition-Tolerant Web
Services
http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
129. Further Oracle NoSQL Resources
●
●
Oracle's Product Page:
http://www.oracle.com/technetwork/products/no
sqldb/overview/index.html
Good Documentation:
http://docs.oracle.com/cd/NOSQL/html/index.ht
ml