SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Institut für Informatik
NOSQL Databases – An Overview
Dr. Lena Wiese
Institut für Informatik
Research Group Knowledge Engineering
Fakultät für Mathematik und Informatik
Georg-August Universität Göttingen
Data Natives 2015
Dr. Lena Wiese NOSQL DBs 1 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Institut für Informatik
Short CV Dr. Lena Wiese
University of Göttingen (Research Group Leader Knowledge
Engineering)
University of Hildesheim (Visiting Professor for Databases)
National Institute of Informatics, Tokyo, Japan
Robert Bosch India Ltd., Bangalore, India
Master/PhD: TU Dortmund
Teaching and Research
NoSQL databases (lecture, seminars, projects)
Database security (encryption for Cassandra and HBase)
Web: http://wiese.free.fr/
Dr. Lena Wiese NOSQL DBs 2 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Institut für Informatik
Book / Workshop
Master’s level text book (in English):
Lena Wiese: Advanced Data Management for SQL,
NoSQL, Cloud and Distributed Databases
c 2015 DeGruyter/Oldenbourg
http://wiese.free.fr/adm.html
(several pictures in this talk taken from the book)
Workshop NoSQL-Net
organized jointly with Irena Holubova (Charles
University, Prague) and Valentina Janev (Instituto
Mihailo Pupin, Belgrade)
3rd edition to be organized at DEXA conference 2016
We welcome industrial application papers
Dr. Lena Wiese NOSQL DBs 3 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Introduction :: Content Institut für Informatik
Overview
1 Introduction
Content
New Requirements
2 Graph Databases
3 XML Databases
4 Key-Value Stores
5 Document Stores
6 Column Stores
7 BigTable Databases
8 Polyglot Data Base Architectures
9 Conclusion
Dr. Lena Wiese NOSQL DBs 4 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Introduction :: Content Institut für Informatik
Content
SQL
Tabular row-wise storage: Relational Databases (RDBs)
Query Language: SQL
versus
NOSQL (Not Only SQL)
Graph Databases
XML Databases
Key-value Stores
Column Stores
Bigtable Databases
Object Databases and Object-Relational Databases
...
Dr. Lena Wiese NOSQL DBs 5 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Introduction :: Content Institut für Informatik
What is a Database System?
A database system is required to
manage
huge amounts of data
in an e cient,
persistent,
reliable,
consistent,
non-redundant way
for multiple users
Dr. Lena Wiese NOSQL DBs 6 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Introduction :: New Requirements Institut für Informatik
New requirements
Data are organized in complex structures (example: social networks)
me
you
foe foo
Data are constantly changing (frequent updates)
Data are distributed on a huge number of interconnected servers
(example: cloud storage)
Dr. Lena Wiese NOSQL DBs 7 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Introduction :: New Requirements Institut für Informatik
New requirements
Data are organized in complex structures (example: social networks)
Data are constantly changing (frequent updates)
write1 write2 write3 write4 write5read1 read2
Data are distributed on a huge number of interconnected servers
(example: cloud storage)
Dr. Lena Wiese NOSQL DBs 7 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Introduction :: New Requirements Institut für Informatik
New requirements
Data are organized in complex structures (example: social networks)
Data are constantly changing (frequent updates)
Data are distributed on a huge number of interconnected servers
(example: cloud storage)
user
S1
S2
S3
S4
S5
data
Dr. Lena Wiese NOSQL DBs 7 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Introduction :: New Requirements Institut für Informatik
New requirements
Data are organized in complex structures (example: social networks)
Data are constantly changing (frequent updates)
Data are distributed on a huge number of interconnected servers
(example: cloud storage)
Revival of non-relational data models for novel applications
Dr. Lena Wiese NOSQL DBs 7 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Graph Databases :: Background Institut für Informatik
Overview
1 Introduction
2 Graph Databases
Background
Graph Management
Systems
3 XML Databases
4 Key-Value Stores
5 Document Stores
6 Column Stores
7 BigTable Databases
8 Polyglot Data Base Architectures
9 Conclusion
Dr. Lena Wiese NOSQL DBs 8 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Graph Databases :: Background Institut für Informatik
Why Graph Databases?
Links between data items are important
Example: Social Networks
Recommender Systems
Semantic Web
Geographic Information Systems
Bioinformatics
...
Name: Alice
Age: 34
Name: Bob
Age: 27
Name: Charlene
Age: 29
knows knows
dislikes
Dr. Lena Wiese NOSQL DBs 9 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Graph Databases :: Background Institut für Informatik
Why Graph Databases?
Links between data items are important
Social Networks
Recommender Systems
Semantic Web
Example: Geographic Information Systems
Bioinformatics
...
City: Hannover
Population: 522T
City: Hildesheim
Population: 102T
City: Braunschweig
Population: 248T
35km 45km
65km
Dr. Lena Wiese NOSQL DBs 9 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Graph Databases :: Graph Management Institut für Informatik
Property Graph Model
A Property Graph is a directed multigraph
Stores information (properties) in vertices and on edges
A Property is a key-value pair like “Name: Alice”
Sometimes multi-value properties: one key, list of values
For vertices and edges: predefined property key called Id with unique
identifier value
Id: 1
Name: Alice
Age: 34
Id: 2
Name: Bob
Age: 27
Id: 3
Name: Charlene
Age: 29
Id: 4 Id: 5
Id: 6
Dr. Lena Wiese NOSQL DBs 10 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Graph Databases :: Graph Management Institut für Informatik
Property Graph Model: Types and Labels
In general, not typed: vertices and edges store arbitrary key-value pairs
However, typing gives semantics to the stored data
For vertices: predefined property key called Type; like “Type: Person”
For edges: predefined property key called Label; optionally: specify
which edge label can be used between which vertex types
Id: 1
Type: Person
Name: Alice
Age: 34
Id: 2
Type: Person
Name: Bob
Age: 27
Id: 3
Type: Person
Name: Charlene
Age: 29
Id: 4
Label: knows
Id: 5
Label: knows
Id: 6
Label: dislikes
Dr. Lena Wiese NOSQL DBs 11 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Graph Databases :: Graph Management Institut für Informatik
Property Graph Model: Multiedges
Multiedges are only allowed when edge labels are di erent
Id: 1
Type: Person
Name: Alice
Age: 34
Id: 2
Type: Person
Name: Bob
Age: 27
Id: 3
Type: Person
Name: Charlene
Age: 29
Id: 4
Label: knows
Id: 5
Label: knows
Id: 6
Label: dislikes
Id: 7
Label: dislikes (not allowed!)
Dr. Lena Wiese NOSQL DBs 12 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Graph Databases :: Graph Management Institut für Informatik
Property Graph Model: Paths
Paths are serial concatenations of edges
End vertex of one edge is start vertex of next edge on the path
Id: 1
Type: Person
Name: Alice
Age: 34
Id: 2
Type: Person
Name: Bob
Age: 27
Id: 3
Type: Person
Name: Charlene
Age: 29
Id: 4
Label: knows
Id: 5
Label: knows
Dr. Lena Wiese NOSQL DBs 13 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Graph Databases :: Graph Management Institut für Informatik
Property Graph Model: Paths
Path “friends-of-friends” concatenates two edges with “Label: knows”
Paths can be used as normal edges
Id: 1
Type: Person
Name: Alice
Age: 34
Id: 2
Type: Person
Name: Bob
Age: 27
Id: 3
Type: Person
Name: Charlene
Age: 29
Id: 4
Label: knows
Id: 5
Label: knows
Path friends-of-friends
Dr. Lena Wiese NOSQL DBs 13 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Graph Databases :: Systems Institut für Informatik
Open Source Systems
The TinkerPop http://tinkerpop.incubator.apache.org/
graph processing stack: a set of open source graph management
modules
Neo4J graph database http://neo4j.com/
Cypher query language
START alice = (people_idx, name, "Alice")
MATCH (alice)-[:knows]->(aperson)
RETURN (aperson)
HyperGraphDB: http://www.hypergraphdb.org/
Graph may contain hyperedges that combine more than two nodes
Dr. Lena Wiese NOSQL DBs 14 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
XML Databases :: Background Institut für Informatik
Overview
1 Introduction
2 Graph Databases
3 XML Databases
Background
Numbering Schemes
Systems
4 Key-Value Stores
5 Document Stores
6 Column Stores
7 BigTable Databases
8 Polyglot Data Base Architectures
9 Conclusion
Dr. Lena Wiese NOSQL DBs 15 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
XML Databases :: Background Institut für Informatik
XML
XML: Extensible Markup Language
Defined by the WWW Consortium (W3C)
Intended as a document markup language (not a database language)
Tags divide documents into sections
Tag: label for a section of data
Element: section of data beginning with <tagname> and ending with
matching </tagname>
Inside an element:
arbitrary text
other elements (“nesting”)
Nothing (“empty element”): abbreviate to <tagname />
Standardized query languages: XPath and XQuery
Dr. Lena Wiese NOSQL DBs 16 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
XML Databases :: Background Institut für Informatik
Tree Model of XML Data
<reservationsystem>
<hotel hotelID="h1">
<name>
Buergermeisterkapelle
</name>
<location>
Hildesheim
</location>
<pricesgl>
65 Euro
</pricesgl>
</hotel>
</reservationsystem>
0
reservationsystem
1
hotel
2 3
name
4
Buergermeisterkapelle
5
location
6
Hildesheim
7
pricesgl
8
65 Euro
hotelID
h1
Dr. Lena Wiese NOSQL DBs 17 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
XML Databases :: Numbering Schemes Institut für Informatik
Numbering Scheme
assigns each node of an XML tree a unique identifier (a label or node
ID which is usually a number)
Important for database application with frequent updates:
How many nodes have to be renumbered in an update?
simplest scheme: preorder traversal of tree
increasing a counter for each node:
root node is numbered as the first node before numbering any other
node
this is done recursively for all child nodes
Renumbering: all nodes in the worst case
Dr. Lena Wiese NOSQL DBs 18 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
XML Databases :: Systems Institut für Informatik
Open Source Systems
eXistDB: http://exist-db.org/
numbering scheme that virtually expands the tree into a complete tree
such that not all node IDs correspond to existing nodes
eXistDB o ers several user APIs: RESTful API, XML:DB API,
XML-RPC API, SOAP AP
BaseX: http://basex.org/
Numbering scheme: Pre/Dist/Size
Several language bindings as well as a REST API, an XQJ API and a
XML:DB API
Dr. Lena Wiese NOSQL DBs 19 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Key-Value Stores :: Background Institut für Informatik
Overview
1 Introduction
2 Graph Databases
3 XML Databases
4 Key-Value Stores
Background
Systems
MapReduce
Systems
5 Document Stores
6 Column Stores
7 BigTable Databases
8 Polyglot Data Base Architectures
9 Conclusion
Dr. Lena Wiese NOSQL DBs 20 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Key-Value Stores :: Background Institut für Informatik
Key-Value Stores
A key value pair is a tuple of two strings Èkey, valueÍ
You can get (or delete) a value from the store by key
Schema-less: you can put arbitrary key-value pairs into the store
value = store.get(key)
store.put(key, value)
store.delete(key)
Values can have other data types than just strings
Values can even be a list or array of atomic values
Simple but quick
Simple data structure
No advanced query language
Good for “data-intensive” applications
Application is responsible or combining key-value pairs into more
complex objects
Dr. Lena Wiese NOSQL DBs 21 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Key-Value Stores :: Systems Institut für Informatik
Open Source Systems
Redis: http://redis.io/
in-memory key-value store
data types: string, linked lists, unsorted set, sorted set, hash, bit array,
hyperloglog
Riak-KV: http://basho.com/products/riak-kv/
key-value pairs called Riak objects grouped into buckets
convergent replicated data types (CRDTs)
Riak’s search functionality based on Apache Solr (Yokozuna)
Dr. Lena Wiese NOSQL DBs 22 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Key-Value Stores :: MapReduce Institut für Informatik
MapReduce
Applied at Google
Je rey Dean / Sanjay Ghemawat, “MapReduce: Simplified Data
Processing on Large Clusters”, OSDI’04: Sixth Symposium on
Operating System Design and Implementation, 2004.
“The computation takes a set of input key/value pairs, and produces
a set of output key/value pairs. The user of the MapReduce library
expresses the computation as two functions: Map and Reduce.”
Four basic steps
1 split input key-value pairs into disjunct subsets
2 compute map function on each input subset
3 group all intermediate values by key (shu e)
4 reduce values of each group
Dr. Lena Wiese NOSQL DBs 23 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Key-Value Stores :: MapReduce Institut für Informatik
MapReduce: Example
split map shu e reduce
sentence1 (word3,1) (word1,(1,1,1)) server4 (word1,3)
sentence2 server1 (word4,1) (word2,(1)) (word2,1)
sentence3 (word3,1)
(word1,1)
sentence4 server2 (word1,1) (word3,(1,1,1)) server5 (word3,3)
sentence5 (word2,1) (word4,(1,1)) (word4,2)
(word4,1)
sentence6 server3 (word3,1)
sentence7 (word1,1)
Dr. Lena Wiese NOSQL DBs 24 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Key-Value Stores :: Systems Institut für Informatik
Open Source Systems
Apache Hadoop: http://hadoop.apache.org/
Hadoop Distributed File System (HDFS)
Apache Spark: http://spark.apache.org/
data flow programming model on top of Hadoop
Apache Pig: http://pig.apache.org/
express parallel execution of data analytics tasks.
input={(’alice’,{’charlene’,’emily’}),
(’bob’,{’david’,’emily’})};
output = FOREACH input GENERATE $0, FLATTEN($1);
Apache Hive: http://hive.apache.org/
querying and data management layer
can serialize tables as files in HDFS
HiveQL queries are compiled into Hadoop MapReduce tasks
Dr. Lena Wiese NOSQL DBs 25 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Document Stores :: Background Institut für Informatik
Overview
1 Introduction
2 Graph Databases
3 XML Databases
4 Key-Value Stores
5 Document Stores
Background
Systems
6 Column Stores
7 BigTable Databases
8 Polyglot Data Base Architectures
9 Conclusion
Dr. Lena Wiese NOSQL DBs 26 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Document Stores :: Background Institut für Informatik
JSON: JavaScript Object Notation
human-readable text format
more compact than XML
nesting of key-value pairs
{
"firstName":"Alice",
"lastName" :"Smith",
"age":31,
"address" :{
"street":"Main Street",
"number":12,
"city":"Newtown",
"zip":31141
} ,
"telephone":[935279,908077,278784]
}
Dr. Lena Wiese NOSQL DBs 27 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Document Stores :: Systems Institut für Informatik
Open Source Systems
MongoDB: https://www.mongodb.org/
BSON storage format (binary JSON representation)
db.persons.find(age$lt: 34)
CouchDB: http://couchdb.apache.org/
retrieval process with views defined as map function
function(doc) {
if(doc.lastname && doc.age) {
emit(doc.lastname, doc.age);
}
}
Couchbase: http://www.couchbase.com
SQL-like query language
Dr. Lena Wiese NOSQL DBs 28 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Column Stores :: Background Institut für Informatik
Overview
1 Introduction
2 Graph Databases
3 XML Databases
4 Key-Value Stores
5 Document Stores
6 Column Stores
Background
Column Compression
Systems
7 BigTable Databases
8 Polyglot Data Base Architectures
9 Conclusion
Dr. Lena Wiese NOSQL DBs 29 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Column Stores :: Background Institut für Informatik
Why Column Stores?
A row store is a row-oriented relational database
Data are stored in tables
On disk, data in a row are stored consecutively
Currently used in most commercially successful RDBMSs
A column store is a column-oriented relational database
Data are stored in tables
On disk, data in a column are stored consecutively
In use since the 1970s but less successful than row stores
Example
BookLending BookID ReaderID ReturnDate
123 225 25-10-2011
234 347 31-10-2011
Storage order in row store:
123,225,25-10-2011,234,347,31-10-2011
Storage order in column store:
123,234,225,347,25-10-2011,31-10-2011
Dr. Lena Wiese NOSQL DBs 30 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Column Stores :: Background Institut für Informatik
Advantages of Column Stores
Only columns (attributes) that are needed are read from disk into
main memory, because a memory page contains only values of a
column
Values in a column (that is, values of the same attribute domain) can
be compressed better when stored consecutively (“locality”)
Iterating or aggregating over values in a column can be done quickly,
because they are stored consecutively
For example, summing up all values in a column, finding the average,
maximum...
Adding new columns to a table is easy
Dr. Lena Wiese NOSQL DBs 31 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Column Stores :: Column Compression Institut für Informatik
Column Compression
Columns may contain lots of repetitions of values
Compression can be more e ective on columns
Option 1: run-length encoding
run-length: how many repetitions of a value are stored consecutively?
Option 2: bit-vector encoding
create a bit vector for each value in the column
Option 3: dictionary encoding
create a dictionary for single values or sequences of values
Option 4: frame of reference encoding
store o -set from a reference point
Option 5: di erential encoding
store o -set from previous value
Stavros Harizopoulos / Daniel Abadi / Peter Boncz,
“Column-Oriented Database Systems”, VLDB Tutorial, 2009
Dr. Lena Wiese NOSQL DBs 32 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Column Stores :: Column Compression Institut für Informatik
Example: Run-Length Encoding
BookLending BID RID RD
123 225 25-10-2012
386 225 20-10-2012
938 225 27-10-2012
123 347 25-11-2012
234 347 31-10-2012
Store ReaderID (RID) in run-length encoding
count number of consecutive repetitions
format: (value, start row, run-length)
RID: ( (225, 1, 3), (347, 4, 2) )
Answer queries on compressed format
How many books does each reader have?
SELECT RID, COUNT(*) FROM BookLending GROUP BY RID
Just return (the sum of) the run-lengths for each ReaderID value
Result: (225, 3), (347, 2)
Dr. Lena Wiese NOSQL DBs 33 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Column Stores :: Systems Institut für Informatik
Systems
MonetDB: https://www.monetdb.org/
open source “column store pioneers”
Apache Parquet: http://parquet.apache.org/
implements column striping: transform nested data to columns
Commercial systems
SAP HANA
HP Vertica
Dr. Lena Wiese NOSQL DBs 34 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
BigTable Databases :: Background Institut für Informatik
Overview
1 Introduction
2 Graph Databases
3 XML Databases
4 Key-Value Stores
5 Document Stores
6 Column Stores
7 BigTable Databases
Background
Storage Organization
Systems
8 Polyglot Data Base Architectures
9 Conclusion
Dr. Lena Wiese NOSQL DBs 35 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
BigTable Databases :: Background Institut für Informatik
Google BigTable
Fay Chang / Je rey Dean / Sanjay Ghemawat / Wilson C. Hsieh /
Deborah A. Wallach / Mike Burrows / Tushar Chandra / Andrew
Fikes / Robert E. Gruber, “Bigtable: A Distributed Storage System
for Structured Data”, OSDI, 2006
“A Bigtable is a sparse, distributed, persistent, multi-dimensional
sorted map”
Google BigTable is indexed by a row key, column key, and a
timestamp
Map: ( row:string, column:string, time:int64) æ string
A Big Table may have an unbounded number of columns.
Columns are grouped into sets called column families.
Dr. Lena Wiese NOSQL DBs 36 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
BigTable Databases :: Background Institut für Informatik
BigTable & HBase Data Structure
Store data that is accessed together in a column family
Columns in a single column family can vary arbitrarily for each row.
Only fetch column families of columns that are required by query
Data locality: Store data in a column family together on disk
table row key column family column family
Library BID BookInfo LendingInfo
123
Title
Databases
Author
Miller
25-10-2012
Mayer
25-11-2012
Green
386
Title
Algorithms
Author
Jacobs
20-10-2012
Mayer
938
Title
Programming
Author
Brown
27-10-2012
Mayer
234
Title
SQL
Author
Smith
31-10-2012
Green
Dr. Lena Wiese NOSQL DBs 37 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
BigTable Databases :: Storage Organization Institut für Informatik
Writing to memory tables and data files
The most recent writes are collected in a main memory table
(memtable) of fixed size.
All data records written to the on-disk store will only be appended to
the existing records.
Once written, these records are read-only and cannot be modified:
they are immutable data files.
Any modification of a record must hence also be simulated by
appending a new record in the store.
Deletions are treated by writing a new record (tombstone) for a key.
Main memory
memtable
Disk
Sorted
file 1
Sorted
file 2
. . .Sorted
file n
flush
write
Dr. Lena Wiese NOSQL DBs 38 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
BigTable Databases :: Storage Organization Institut für Informatik
Reading from memory tables and data files
The downside of immutable data files is that they complicate the read
process:
retrieving all the relevant data that match a user query requires
combining records from several on-disk data files and the memtable.
This combination may a ect records for di erent search keys that are
spread out across several data files; but it may also apply to records
for the same key of which di erent versions exist in di erent data files.
In other words, all sorted data files have to be searched for records
matching the read request.
Main memory
memtable
block bu er
Disk
Sorted
file 1
Sorted
file 2
. . .Sorted
file n
read combine
combine
Dr. Lena Wiese NOSQL DBs 39 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
BigTable Databases :: Systems Institut für Informatik
Open Source Systems
Apache Cassandra: http://cassandra.apache.org/
column families in a keyspace
CQL: SQL-like query language
INSERT INTO bookinfo (bookid, title, author)
VALUES (1002,’Databases’,’Miller’);
Apache HBase: http://hbase.apache.org/
stores tables in namespaces
tables contain column families
Dr. Lena Wiese NOSQL DBs 40 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Polyglot Data Base Architectures :: Polyglot Persistence Institut für Informatik
Overview
1 Introduction
2 Graph Databases
3 XML Databases
4 Key-Value Stores
5 Document Stores
6 Column Stores
7 BigTable Databases
8 Polyglot Data Base Architectures
Polyglot Persistence
Lambda Architecture
Multi-Model Databases
9 Conclusion
Dr. Lena Wiese NOSQL DBs 41 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Polyglot Data Base Architectures :: Polyglot Persistence Institut für Informatik
Polyglot Persistence
Choose as many databases as needed
Fowler, M.J., Sadalage, P.J.: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence.
Prentice Hall (2012)
Example: Apache Drill http://drill.apache.org/
Apache Drill is inspired by the ideas developed in Google’s Dremel
system
Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T.: Dremel: interactive
analysis of web-scale datasets. Proceedings of the VLDB Endowment 3(1-2), 330–339 (2010)
Better introduce an integration layer (instead of pushing the burden of
query handling and database synchronization to the application level)
decomposing queries in to several subqueries
redirecting queries to the appropriate databases
recombining the results obtained from the accessed databases
Dr. Lena Wiese NOSQL DBs 42 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Polyglot Data Base Architectures :: Polyglot Persistence Institut für Informatik
Polyglot Persistence
Integration layer
Query decomposition
Query redirection
Result recombination
Synchronization
key-value
store
graph
database
SQL
database
in-memory
store
graph
traversal
analytical
query
write-heavy
transaction
SQL query REST-
based access
Dr. Lena Wiese NOSQL DBs 43 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Polyglot Data Base Architectures :: Lambda Architecture Institut für Informatik
Lambda Architecture
For real-time / streaming data
Combination of a slower batch processing layer and a speedier stream
processing layer
Speed layer collects only the most recent data and delivers these results
in several real-time views
Batch layer stores all data in an append-only and im- mutable fashion
in a so-called master dataset and results are delivered in so-called batch
views
Serving layer makes batch views accessible to user queries by
maintaining indexes
User queries answered by merging data from batch views and
real-time views
Open source implementation following the ideas of a lambda
architecture is Apache Druid http://druid.io/ (streaming data
in real-time nodes and batch data in historical nodes)
Dr. Lena Wiese NOSQL DBs 44 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Polyglot Data Base Architectures :: Lambda Architecture Institut für Informatik
Lambda Architecture
Batch layer
Batch view 1
Batch view 2
Batch view 3
Batch view 4
Master
data set
Speed layer
Recent
data set
Speed view 1
Speed view 2
Speed view 3
Speed view 4
Serving layer
Index 1
Index 2
Index 3
Index 4
Data
stream
append
append
merge
Dr. Lena Wiese NOSQL DBs 45 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Polyglot Data Base Architectures :: Multi-Model Databases Institut für Informatik
Multi-Model Databases
Use a database system that stores data in a single store but provides
access to the data with di erent APIs according to di erent data
models
Either support di erent data models directly inside the database
engine or o er layers for additional data models on top of a
single-model engine
OrientDB http://orientdb.com/
a document API, an object API, and a graph API (Java Graph API is
compliant with Tinkerpop)
extensions of the SQL standard to interact will all three APIs
ArangoDB https://www.arangodb.com/
a graph API, a key-value API and a document API
Query language AQL (ArangoDB query language) resembles
SQL but adds several database-specific extensions to it
Dr. Lena Wiese NOSQL DBs 46 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Polyglot Data Base Architectures :: Multi-Model Databases Institut für Informatik
Multi-Model Databases
Graph layer
key-value store
graph
traversal write-heavy
transaction REST-
based access
Dr. Lena Wiese NOSQL DBs 47 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Conclusion Institut für Informatik
Overview
1 Introduction
2 Graph Databases
3 XML Databases
4 Key-Value Stores
5 Document Stores
6 Column Stores
7 BigTable Databases
8 Polyglot Data Base Architectures
9 Conclusion
Dr. Lena Wiese NOSQL DBs 48 / 49
{ }
Knowledge
Engineering
K9
Georg-August-Universität Göttingen
Conclusion Institut für Informatik
Conclusion
Many, many other data models than just relational tables
Lots of di erent query languages (no standards)
Problems with reliability (no long-term experience, open source
development teams)
Which database you choose depends on your needs
Dr. Lena Wiese NOSQL DBs 49 / 49

Contenu connexe

Tendances

Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006raj_vij
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsShivansh Gaur
 
Parallel and Distributed Algorithms for Large Text Datasets Analysis
Parallel and Distributed Algorithms for Large Text Datasets AnalysisParallel and Distributed Algorithms for Large Text Datasets Analysis
Parallel and Distributed Algorithms for Large Text Datasets AnalysisIllia Ovchynnikov
 
B03302007012
B03302007012B03302007012
B03302007012theijes
 
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
Java Abs   Peer To Peer Design & Implementation Of A Tuple SpaceJava Abs   Peer To Peer Design & Implementation Of A Tuple Space
Java Abs Peer To Peer Design & Implementation Of A Tuple Spacencct
 
A basic course on Research data management, part 3: sharing your data
A basic course on Research data management, part 3: sharing your dataA basic course on Research data management, part 3: sharing your data
A basic course on Research data management, part 3: sharing your dataLeon Osinski
 
The Art and Science of DDS Data Modelling
The Art and Science of DDS Data ModellingThe Art and Science of DDS Data Modelling
The Art and Science of DDS Data ModellingAngelo Corsaro
 
Convergence of Machine Learning, Big Data and Supercomputing
Convergence of Machine Learning, Big Data and SupercomputingConvergence of Machine Learning, Big Data and Supercomputing
Convergence of Machine Learning, Big Data and SupercomputingDESMOND YUEN
 
Architecture and Evaluation on Cooperative Caching In Wireless P2P
Architecture and Evaluation on Cooperative Caching In Wireless  P2PArchitecture and Evaluation on Cooperative Caching In Wireless  P2P
Architecture and Evaluation on Cooperative Caching In Wireless P2PIOSR Journals
 
Evaluation of a topological distance
Evaluation of a topological distanceEvaluation of a topological distance
Evaluation of a topological distanceIJCNCJournal
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 
A basic course on Reseach data management, part 2: protecting and organizing ...
A basic course on Reseach data management, part 2: protecting and organizing ...A basic course on Reseach data management, part 2: protecting and organizing ...
A basic course on Reseach data management, part 2: protecting and organizing ...Leon Osinski
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Tobias Kuhn
 
Shared Editing on the Web: A Classification of Developer Support Frameworks
Shared Editing on the Web: A Classification of Developer Support FrameworksShared Editing on the Web: A Classification of Developer Support Frameworks
Shared Editing on the Web: A Classification of Developer Support FrameworksIstvanKoren
 
Plenzogan technology
Plenzogan technologyPlenzogan technology
Plenzogan technologyplenzogan
 
Organizing EEG data using the Brain Imaging Data Structure
Organizing EEG data using the Brain Imaging Data Structure Organizing EEG data using the Brain Imaging Data Structure
Organizing EEG data using the Brain Imaging Data Structure Robert Oostenveld
 
The Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRSThe Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRSRobert Oostenveld
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methodsijcsity
 

Tendances (19)

Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity Algorithms
 
Lecture 01
Lecture 01Lecture 01
Lecture 01
 
Parallel and Distributed Algorithms for Large Text Datasets Analysis
Parallel and Distributed Algorithms for Large Text Datasets AnalysisParallel and Distributed Algorithms for Large Text Datasets Analysis
Parallel and Distributed Algorithms for Large Text Datasets Analysis
 
B03302007012
B03302007012B03302007012
B03302007012
 
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
Java Abs   Peer To Peer Design & Implementation Of A Tuple SpaceJava Abs   Peer To Peer Design & Implementation Of A Tuple Space
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
 
A basic course on Research data management, part 3: sharing your data
A basic course on Research data management, part 3: sharing your dataA basic course on Research data management, part 3: sharing your data
A basic course on Research data management, part 3: sharing your data
 
The Art and Science of DDS Data Modelling
The Art and Science of DDS Data ModellingThe Art and Science of DDS Data Modelling
The Art and Science of DDS Data Modelling
 
Convergence of Machine Learning, Big Data and Supercomputing
Convergence of Machine Learning, Big Data and SupercomputingConvergence of Machine Learning, Big Data and Supercomputing
Convergence of Machine Learning, Big Data and Supercomputing
 
Architecture and Evaluation on Cooperative Caching In Wireless P2P
Architecture and Evaluation on Cooperative Caching In Wireless  P2PArchitecture and Evaluation on Cooperative Caching In Wireless  P2P
Architecture and Evaluation on Cooperative Caching In Wireless P2P
 
Evaluation of a topological distance
Evaluation of a topological distanceEvaluation of a topological distance
Evaluation of a topological distance
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
A basic course on Reseach data management, part 2: protecting and organizing ...
A basic course on Reseach data management, part 2: protecting and organizing ...A basic course on Reseach data management, part 2: protecting and organizing ...
A basic course on Reseach data management, part 2: protecting and organizing ...
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications
 
Shared Editing on the Web: A Classification of Developer Support Frameworks
Shared Editing on the Web: A Classification of Developer Support FrameworksShared Editing on the Web: A Classification of Developer Support Frameworks
Shared Editing on the Web: A Classification of Developer Support Frameworks
 
Plenzogan technology
Plenzogan technologyPlenzogan technology
Plenzogan technology
 
Organizing EEG data using the Brain Imaging Data Structure
Organizing EEG data using the Brain Imaging Data Structure Organizing EEG data using the Brain Imaging Data Structure
Organizing EEG data using the Brain Imaging Data Structure
 
The Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRSThe Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRS
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 

Similaire à " NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer at Göttingen University

Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryRuben Schalk
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesAlessandro Adamou
 
The future importance of bibliographic data
The future importance of bibliographic dataThe future importance of bibliographic data
The future importance of bibliographic dataPatrick Danowski
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesdiannepatricia
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesdiannepatricia
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
 
Test Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsTest Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsHugh McCamphill
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad DataSteffen Staab
 
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataIFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataLars G. Svensson
 
Test trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely testsTest trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely testsHugh McCamphill
 

Similaire à " NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer at Göttingen University (20)

2014kalman 140910040324-phpapp02 (1)
2014kalman 140910040324-phpapp02 (1)2014kalman 140910040324-phpapp02 (1)
2014kalman 140910040324-phpapp02 (1)
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
The future importance of bibliographic data
The future importance of bibliographic dataThe future importance of bibliographic data
The future importance of bibliographic data
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
LOD2 Webinar Series FOX
LOD2 Webinar Series FOXLOD2 Webinar Series FOX
LOD2 Webinar Series FOX
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_series
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_series
 
Introduction to lod
Introduction to lodIntroduction to lod
Introduction to lod
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Test Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsTest Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely tests
 
Sylva
SylvaSylva
Sylva
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
ECCS 2010
ECCS 2010ECCS 2010
ECCS 2010
 
ld4dh demo lecture
ld4dh demo lectureld4dh demo lecture
ld4dh demo lecture
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad Data
 
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataIFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
 
Test trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely testsTest trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely tests
 

Plus de Dataconomy Media

Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...Dataconomy Media
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Dataconomy Media
 
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Dataconomy Media
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Dataconomy Media
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...Dataconomy Media
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Dataconomy Media
 
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...Dataconomy Media
 
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Dataconomy Media
 
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...Dataconomy Media
 
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Dataconomy Media
 
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Dataconomy Media
 
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Dataconomy Media
 
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Dataconomy Media
 
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Dataconomy Media
 
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Dataconomy Media
 
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Dataconomy Media
 
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Dataconomy Media
 
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Dataconomy Media
 
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Dataconomy Media
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
 

Plus de Dataconomy Media (20)

Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
 
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
 
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
 
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
 
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
 
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
 
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
 
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
 
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
 
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
 
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
 
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
 
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
 
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
 
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
 

Dernier

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Dernier (20)

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

" NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer at Göttingen University

  • 1. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Institut für Informatik NOSQL Databases – An Overview Dr. Lena Wiese Institut für Informatik Research Group Knowledge Engineering Fakultät für Mathematik und Informatik Georg-August Universität Göttingen Data Natives 2015 Dr. Lena Wiese NOSQL DBs 1 / 49
  • 2. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Institut für Informatik Short CV Dr. Lena Wiese University of Göttingen (Research Group Leader Knowledge Engineering) University of Hildesheim (Visiting Professor for Databases) National Institute of Informatics, Tokyo, Japan Robert Bosch India Ltd., Bangalore, India Master/PhD: TU Dortmund Teaching and Research NoSQL databases (lecture, seminars, projects) Database security (encryption for Cassandra and HBase) Web: http://wiese.free.fr/ Dr. Lena Wiese NOSQL DBs 2 / 49
  • 3. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Institut für Informatik Book / Workshop Master’s level text book (in English): Lena Wiese: Advanced Data Management for SQL, NoSQL, Cloud and Distributed Databases c 2015 DeGruyter/Oldenbourg http://wiese.free.fr/adm.html (several pictures in this talk taken from the book) Workshop NoSQL-Net organized jointly with Irena Holubova (Charles University, Prague) and Valentina Janev (Instituto Mihailo Pupin, Belgrade) 3rd edition to be organized at DEXA conference 2016 We welcome industrial application papers Dr. Lena Wiese NOSQL DBs 3 / 49
  • 4. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Introduction :: Content Institut für Informatik Overview 1 Introduction Content New Requirements 2 Graph Databases 3 XML Databases 4 Key-Value Stores 5 Document Stores 6 Column Stores 7 BigTable Databases 8 Polyglot Data Base Architectures 9 Conclusion Dr. Lena Wiese NOSQL DBs 4 / 49
  • 5. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Introduction :: Content Institut für Informatik Content SQL Tabular row-wise storage: Relational Databases (RDBs) Query Language: SQL versus NOSQL (Not Only SQL) Graph Databases XML Databases Key-value Stores Column Stores Bigtable Databases Object Databases and Object-Relational Databases ... Dr. Lena Wiese NOSQL DBs 5 / 49
  • 6. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Introduction :: Content Institut für Informatik What is a Database System? A database system is required to manage huge amounts of data in an e cient, persistent, reliable, consistent, non-redundant way for multiple users Dr. Lena Wiese NOSQL DBs 6 / 49
  • 7. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Introduction :: New Requirements Institut für Informatik New requirements Data are organized in complex structures (example: social networks) me you foe foo Data are constantly changing (frequent updates) Data are distributed on a huge number of interconnected servers (example: cloud storage) Dr. Lena Wiese NOSQL DBs 7 / 49
  • 8. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Introduction :: New Requirements Institut für Informatik New requirements Data are organized in complex structures (example: social networks) Data are constantly changing (frequent updates) write1 write2 write3 write4 write5read1 read2 Data are distributed on a huge number of interconnected servers (example: cloud storage) Dr. Lena Wiese NOSQL DBs 7 / 49
  • 9. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Introduction :: New Requirements Institut für Informatik New requirements Data are organized in complex structures (example: social networks) Data are constantly changing (frequent updates) Data are distributed on a huge number of interconnected servers (example: cloud storage) user S1 S2 S3 S4 S5 data Dr. Lena Wiese NOSQL DBs 7 / 49
  • 10. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Introduction :: New Requirements Institut für Informatik New requirements Data are organized in complex structures (example: social networks) Data are constantly changing (frequent updates) Data are distributed on a huge number of interconnected servers (example: cloud storage) Revival of non-relational data models for novel applications Dr. Lena Wiese NOSQL DBs 7 / 49
  • 11. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Graph Databases :: Background Institut für Informatik Overview 1 Introduction 2 Graph Databases Background Graph Management Systems 3 XML Databases 4 Key-Value Stores 5 Document Stores 6 Column Stores 7 BigTable Databases 8 Polyglot Data Base Architectures 9 Conclusion Dr. Lena Wiese NOSQL DBs 8 / 49
  • 12. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Graph Databases :: Background Institut für Informatik Why Graph Databases? Links between data items are important Example: Social Networks Recommender Systems Semantic Web Geographic Information Systems Bioinformatics ... Name: Alice Age: 34 Name: Bob Age: 27 Name: Charlene Age: 29 knows knows dislikes Dr. Lena Wiese NOSQL DBs 9 / 49
  • 13. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Graph Databases :: Background Institut für Informatik Why Graph Databases? Links between data items are important Social Networks Recommender Systems Semantic Web Example: Geographic Information Systems Bioinformatics ... City: Hannover Population: 522T City: Hildesheim Population: 102T City: Braunschweig Population: 248T 35km 45km 65km Dr. Lena Wiese NOSQL DBs 9 / 49
  • 14. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Graph Databases :: Graph Management Institut für Informatik Property Graph Model A Property Graph is a directed multigraph Stores information (properties) in vertices and on edges A Property is a key-value pair like “Name: Alice” Sometimes multi-value properties: one key, list of values For vertices and edges: predefined property key called Id with unique identifier value Id: 1 Name: Alice Age: 34 Id: 2 Name: Bob Age: 27 Id: 3 Name: Charlene Age: 29 Id: 4 Id: 5 Id: 6 Dr. Lena Wiese NOSQL DBs 10 / 49
  • 15. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Graph Databases :: Graph Management Institut für Informatik Property Graph Model: Types and Labels In general, not typed: vertices and edges store arbitrary key-value pairs However, typing gives semantics to the stored data For vertices: predefined property key called Type; like “Type: Person” For edges: predefined property key called Label; optionally: specify which edge label can be used between which vertex types Id: 1 Type: Person Name: Alice Age: 34 Id: 2 Type: Person Name: Bob Age: 27 Id: 3 Type: Person Name: Charlene Age: 29 Id: 4 Label: knows Id: 5 Label: knows Id: 6 Label: dislikes Dr. Lena Wiese NOSQL DBs 11 / 49
  • 16. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Graph Databases :: Graph Management Institut für Informatik Property Graph Model: Multiedges Multiedges are only allowed when edge labels are di erent Id: 1 Type: Person Name: Alice Age: 34 Id: 2 Type: Person Name: Bob Age: 27 Id: 3 Type: Person Name: Charlene Age: 29 Id: 4 Label: knows Id: 5 Label: knows Id: 6 Label: dislikes Id: 7 Label: dislikes (not allowed!) Dr. Lena Wiese NOSQL DBs 12 / 49
  • 17. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Graph Databases :: Graph Management Institut für Informatik Property Graph Model: Paths Paths are serial concatenations of edges End vertex of one edge is start vertex of next edge on the path Id: 1 Type: Person Name: Alice Age: 34 Id: 2 Type: Person Name: Bob Age: 27 Id: 3 Type: Person Name: Charlene Age: 29 Id: 4 Label: knows Id: 5 Label: knows Dr. Lena Wiese NOSQL DBs 13 / 49
  • 18. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Graph Databases :: Graph Management Institut für Informatik Property Graph Model: Paths Path “friends-of-friends” concatenates two edges with “Label: knows” Paths can be used as normal edges Id: 1 Type: Person Name: Alice Age: 34 Id: 2 Type: Person Name: Bob Age: 27 Id: 3 Type: Person Name: Charlene Age: 29 Id: 4 Label: knows Id: 5 Label: knows Path friends-of-friends Dr. Lena Wiese NOSQL DBs 13 / 49
  • 19. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Graph Databases :: Systems Institut für Informatik Open Source Systems The TinkerPop http://tinkerpop.incubator.apache.org/ graph processing stack: a set of open source graph management modules Neo4J graph database http://neo4j.com/ Cypher query language START alice = (people_idx, name, "Alice") MATCH (alice)-[:knows]->(aperson) RETURN (aperson) HyperGraphDB: http://www.hypergraphdb.org/ Graph may contain hyperedges that combine more than two nodes Dr. Lena Wiese NOSQL DBs 14 / 49
  • 20. { } Knowledge Engineering K9 Georg-August-Universität Göttingen XML Databases :: Background Institut für Informatik Overview 1 Introduction 2 Graph Databases 3 XML Databases Background Numbering Schemes Systems 4 Key-Value Stores 5 Document Stores 6 Column Stores 7 BigTable Databases 8 Polyglot Data Base Architectures 9 Conclusion Dr. Lena Wiese NOSQL DBs 15 / 49
  • 21. { } Knowledge Engineering K9 Georg-August-Universität Göttingen XML Databases :: Background Institut für Informatik XML XML: Extensible Markup Language Defined by the WWW Consortium (W3C) Intended as a document markup language (not a database language) Tags divide documents into sections Tag: label for a section of data Element: section of data beginning with <tagname> and ending with matching </tagname> Inside an element: arbitrary text other elements (“nesting”) Nothing (“empty element”): abbreviate to <tagname /> Standardized query languages: XPath and XQuery Dr. Lena Wiese NOSQL DBs 16 / 49
  • 22. { } Knowledge Engineering K9 Georg-August-Universität Göttingen XML Databases :: Background Institut für Informatik Tree Model of XML Data <reservationsystem> <hotel hotelID="h1"> <name> Buergermeisterkapelle </name> <location> Hildesheim </location> <pricesgl> 65 Euro </pricesgl> </hotel> </reservationsystem> 0 reservationsystem 1 hotel 2 3 name 4 Buergermeisterkapelle 5 location 6 Hildesheim 7 pricesgl 8 65 Euro hotelID h1 Dr. Lena Wiese NOSQL DBs 17 / 49
  • 23. { } Knowledge Engineering K9 Georg-August-Universität Göttingen XML Databases :: Numbering Schemes Institut für Informatik Numbering Scheme assigns each node of an XML tree a unique identifier (a label or node ID which is usually a number) Important for database application with frequent updates: How many nodes have to be renumbered in an update? simplest scheme: preorder traversal of tree increasing a counter for each node: root node is numbered as the first node before numbering any other node this is done recursively for all child nodes Renumbering: all nodes in the worst case Dr. Lena Wiese NOSQL DBs 18 / 49
  • 24. { } Knowledge Engineering K9 Georg-August-Universität Göttingen XML Databases :: Systems Institut für Informatik Open Source Systems eXistDB: http://exist-db.org/ numbering scheme that virtually expands the tree into a complete tree such that not all node IDs correspond to existing nodes eXistDB o ers several user APIs: RESTful API, XML:DB API, XML-RPC API, SOAP AP BaseX: http://basex.org/ Numbering scheme: Pre/Dist/Size Several language bindings as well as a REST API, an XQJ API and a XML:DB API Dr. Lena Wiese NOSQL DBs 19 / 49
  • 25. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Key-Value Stores :: Background Institut für Informatik Overview 1 Introduction 2 Graph Databases 3 XML Databases 4 Key-Value Stores Background Systems MapReduce Systems 5 Document Stores 6 Column Stores 7 BigTable Databases 8 Polyglot Data Base Architectures 9 Conclusion Dr. Lena Wiese NOSQL DBs 20 / 49
  • 26. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Key-Value Stores :: Background Institut für Informatik Key-Value Stores A key value pair is a tuple of two strings Èkey, valueÍ You can get (or delete) a value from the store by key Schema-less: you can put arbitrary key-value pairs into the store value = store.get(key) store.put(key, value) store.delete(key) Values can have other data types than just strings Values can even be a list or array of atomic values Simple but quick Simple data structure No advanced query language Good for “data-intensive” applications Application is responsible or combining key-value pairs into more complex objects Dr. Lena Wiese NOSQL DBs 21 / 49
  • 27. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Key-Value Stores :: Systems Institut für Informatik Open Source Systems Redis: http://redis.io/ in-memory key-value store data types: string, linked lists, unsorted set, sorted set, hash, bit array, hyperloglog Riak-KV: http://basho.com/products/riak-kv/ key-value pairs called Riak objects grouped into buckets convergent replicated data types (CRDTs) Riak’s search functionality based on Apache Solr (Yokozuna) Dr. Lena Wiese NOSQL DBs 22 / 49
  • 28. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Key-Value Stores :: MapReduce Institut für Informatik MapReduce Applied at Google Je rey Dean / Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, OSDI’04: Sixth Symposium on Operating System Design and Implementation, 2004. “The computation takes a set of input key/value pairs, and produces a set of output key/value pairs. The user of the MapReduce library expresses the computation as two functions: Map and Reduce.” Four basic steps 1 split input key-value pairs into disjunct subsets 2 compute map function on each input subset 3 group all intermediate values by key (shu e) 4 reduce values of each group Dr. Lena Wiese NOSQL DBs 23 / 49
  • 29. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Key-Value Stores :: MapReduce Institut für Informatik MapReduce: Example split map shu e reduce sentence1 (word3,1) (word1,(1,1,1)) server4 (word1,3) sentence2 server1 (word4,1) (word2,(1)) (word2,1) sentence3 (word3,1) (word1,1) sentence4 server2 (word1,1) (word3,(1,1,1)) server5 (word3,3) sentence5 (word2,1) (word4,(1,1)) (word4,2) (word4,1) sentence6 server3 (word3,1) sentence7 (word1,1) Dr. Lena Wiese NOSQL DBs 24 / 49
  • 30. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Key-Value Stores :: Systems Institut für Informatik Open Source Systems Apache Hadoop: http://hadoop.apache.org/ Hadoop Distributed File System (HDFS) Apache Spark: http://spark.apache.org/ data flow programming model on top of Hadoop Apache Pig: http://pig.apache.org/ express parallel execution of data analytics tasks. input={(’alice’,{’charlene’,’emily’}), (’bob’,{’david’,’emily’})}; output = FOREACH input GENERATE $0, FLATTEN($1); Apache Hive: http://hive.apache.org/ querying and data management layer can serialize tables as files in HDFS HiveQL queries are compiled into Hadoop MapReduce tasks Dr. Lena Wiese NOSQL DBs 25 / 49
  • 31. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Document Stores :: Background Institut für Informatik Overview 1 Introduction 2 Graph Databases 3 XML Databases 4 Key-Value Stores 5 Document Stores Background Systems 6 Column Stores 7 BigTable Databases 8 Polyglot Data Base Architectures 9 Conclusion Dr. Lena Wiese NOSQL DBs 26 / 49
  • 32. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Document Stores :: Background Institut für Informatik JSON: JavaScript Object Notation human-readable text format more compact than XML nesting of key-value pairs { "firstName":"Alice", "lastName" :"Smith", "age":31, "address" :{ "street":"Main Street", "number":12, "city":"Newtown", "zip":31141 } , "telephone":[935279,908077,278784] } Dr. Lena Wiese NOSQL DBs 27 / 49
  • 33. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Document Stores :: Systems Institut für Informatik Open Source Systems MongoDB: https://www.mongodb.org/ BSON storage format (binary JSON representation) db.persons.find(age$lt: 34) CouchDB: http://couchdb.apache.org/ retrieval process with views defined as map function function(doc) { if(doc.lastname && doc.age) { emit(doc.lastname, doc.age); } } Couchbase: http://www.couchbase.com SQL-like query language Dr. Lena Wiese NOSQL DBs 28 / 49
  • 34. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Column Stores :: Background Institut für Informatik Overview 1 Introduction 2 Graph Databases 3 XML Databases 4 Key-Value Stores 5 Document Stores 6 Column Stores Background Column Compression Systems 7 BigTable Databases 8 Polyglot Data Base Architectures 9 Conclusion Dr. Lena Wiese NOSQL DBs 29 / 49
  • 35. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Column Stores :: Background Institut für Informatik Why Column Stores? A row store is a row-oriented relational database Data are stored in tables On disk, data in a row are stored consecutively Currently used in most commercially successful RDBMSs A column store is a column-oriented relational database Data are stored in tables On disk, data in a column are stored consecutively In use since the 1970s but less successful than row stores Example BookLending BookID ReaderID ReturnDate 123 225 25-10-2011 234 347 31-10-2011 Storage order in row store: 123,225,25-10-2011,234,347,31-10-2011 Storage order in column store: 123,234,225,347,25-10-2011,31-10-2011 Dr. Lena Wiese NOSQL DBs 30 / 49
  • 36. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Column Stores :: Background Institut für Informatik Advantages of Column Stores Only columns (attributes) that are needed are read from disk into main memory, because a memory page contains only values of a column Values in a column (that is, values of the same attribute domain) can be compressed better when stored consecutively (“locality”) Iterating or aggregating over values in a column can be done quickly, because they are stored consecutively For example, summing up all values in a column, finding the average, maximum... Adding new columns to a table is easy Dr. Lena Wiese NOSQL DBs 31 / 49
  • 37. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Column Stores :: Column Compression Institut für Informatik Column Compression Columns may contain lots of repetitions of values Compression can be more e ective on columns Option 1: run-length encoding run-length: how many repetitions of a value are stored consecutively? Option 2: bit-vector encoding create a bit vector for each value in the column Option 3: dictionary encoding create a dictionary for single values or sequences of values Option 4: frame of reference encoding store o -set from a reference point Option 5: di erential encoding store o -set from previous value Stavros Harizopoulos / Daniel Abadi / Peter Boncz, “Column-Oriented Database Systems”, VLDB Tutorial, 2009 Dr. Lena Wiese NOSQL DBs 32 / 49
  • 38. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Column Stores :: Column Compression Institut für Informatik Example: Run-Length Encoding BookLending BID RID RD 123 225 25-10-2012 386 225 20-10-2012 938 225 27-10-2012 123 347 25-11-2012 234 347 31-10-2012 Store ReaderID (RID) in run-length encoding count number of consecutive repetitions format: (value, start row, run-length) RID: ( (225, 1, 3), (347, 4, 2) ) Answer queries on compressed format How many books does each reader have? SELECT RID, COUNT(*) FROM BookLending GROUP BY RID Just return (the sum of) the run-lengths for each ReaderID value Result: (225, 3), (347, 2) Dr. Lena Wiese NOSQL DBs 33 / 49
  • 39. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Column Stores :: Systems Institut für Informatik Systems MonetDB: https://www.monetdb.org/ open source “column store pioneers” Apache Parquet: http://parquet.apache.org/ implements column striping: transform nested data to columns Commercial systems SAP HANA HP Vertica Dr. Lena Wiese NOSQL DBs 34 / 49
  • 40. { } Knowledge Engineering K9 Georg-August-Universität Göttingen BigTable Databases :: Background Institut für Informatik Overview 1 Introduction 2 Graph Databases 3 XML Databases 4 Key-Value Stores 5 Document Stores 6 Column Stores 7 BigTable Databases Background Storage Organization Systems 8 Polyglot Data Base Architectures 9 Conclusion Dr. Lena Wiese NOSQL DBs 35 / 49
  • 41. { } Knowledge Engineering K9 Georg-August-Universität Göttingen BigTable Databases :: Background Institut für Informatik Google BigTable Fay Chang / Je rey Dean / Sanjay Ghemawat / Wilson C. Hsieh / Deborah A. Wallach / Mike Burrows / Tushar Chandra / Andrew Fikes / Robert E. Gruber, “Bigtable: A Distributed Storage System for Structured Data”, OSDI, 2006 “A Bigtable is a sparse, distributed, persistent, multi-dimensional sorted map” Google BigTable is indexed by a row key, column key, and a timestamp Map: ( row:string, column:string, time:int64) æ string A Big Table may have an unbounded number of columns. Columns are grouped into sets called column families. Dr. Lena Wiese NOSQL DBs 36 / 49
  • 42. { } Knowledge Engineering K9 Georg-August-Universität Göttingen BigTable Databases :: Background Institut für Informatik BigTable & HBase Data Structure Store data that is accessed together in a column family Columns in a single column family can vary arbitrarily for each row. Only fetch column families of columns that are required by query Data locality: Store data in a column family together on disk table row key column family column family Library BID BookInfo LendingInfo 123 Title Databases Author Miller 25-10-2012 Mayer 25-11-2012 Green 386 Title Algorithms Author Jacobs 20-10-2012 Mayer 938 Title Programming Author Brown 27-10-2012 Mayer 234 Title SQL Author Smith 31-10-2012 Green Dr. Lena Wiese NOSQL DBs 37 / 49
  • 43. { } Knowledge Engineering K9 Georg-August-Universität Göttingen BigTable Databases :: Storage Organization Institut für Informatik Writing to memory tables and data files The most recent writes are collected in a main memory table (memtable) of fixed size. All data records written to the on-disk store will only be appended to the existing records. Once written, these records are read-only and cannot be modified: they are immutable data files. Any modification of a record must hence also be simulated by appending a new record in the store. Deletions are treated by writing a new record (tombstone) for a key. Main memory memtable Disk Sorted file 1 Sorted file 2 . . .Sorted file n flush write Dr. Lena Wiese NOSQL DBs 38 / 49
  • 44. { } Knowledge Engineering K9 Georg-August-Universität Göttingen BigTable Databases :: Storage Organization Institut für Informatik Reading from memory tables and data files The downside of immutable data files is that they complicate the read process: retrieving all the relevant data that match a user query requires combining records from several on-disk data files and the memtable. This combination may a ect records for di erent search keys that are spread out across several data files; but it may also apply to records for the same key of which di erent versions exist in di erent data files. In other words, all sorted data files have to be searched for records matching the read request. Main memory memtable block bu er Disk Sorted file 1 Sorted file 2 . . .Sorted file n read combine combine Dr. Lena Wiese NOSQL DBs 39 / 49
  • 45. { } Knowledge Engineering K9 Georg-August-Universität Göttingen BigTable Databases :: Systems Institut für Informatik Open Source Systems Apache Cassandra: http://cassandra.apache.org/ column families in a keyspace CQL: SQL-like query language INSERT INTO bookinfo (bookid, title, author) VALUES (1002,’Databases’,’Miller’); Apache HBase: http://hbase.apache.org/ stores tables in namespaces tables contain column families Dr. Lena Wiese NOSQL DBs 40 / 49
  • 46. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Polyglot Data Base Architectures :: Polyglot Persistence Institut für Informatik Overview 1 Introduction 2 Graph Databases 3 XML Databases 4 Key-Value Stores 5 Document Stores 6 Column Stores 7 BigTable Databases 8 Polyglot Data Base Architectures Polyglot Persistence Lambda Architecture Multi-Model Databases 9 Conclusion Dr. Lena Wiese NOSQL DBs 41 / 49
  • 47. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Polyglot Data Base Architectures :: Polyglot Persistence Institut für Informatik Polyglot Persistence Choose as many databases as needed Fowler, M.J., Sadalage, P.J.: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Prentice Hall (2012) Example: Apache Drill http://drill.apache.org/ Apache Drill is inspired by the ideas developed in Google’s Dremel system Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T.: Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3(1-2), 330–339 (2010) Better introduce an integration layer (instead of pushing the burden of query handling and database synchronization to the application level) decomposing queries in to several subqueries redirecting queries to the appropriate databases recombining the results obtained from the accessed databases Dr. Lena Wiese NOSQL DBs 42 / 49
  • 48. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Polyglot Data Base Architectures :: Polyglot Persistence Institut für Informatik Polyglot Persistence Integration layer Query decomposition Query redirection Result recombination Synchronization key-value store graph database SQL database in-memory store graph traversal analytical query write-heavy transaction SQL query REST- based access Dr. Lena Wiese NOSQL DBs 43 / 49
  • 49. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Polyglot Data Base Architectures :: Lambda Architecture Institut für Informatik Lambda Architecture For real-time / streaming data Combination of a slower batch processing layer and a speedier stream processing layer Speed layer collects only the most recent data and delivers these results in several real-time views Batch layer stores all data in an append-only and im- mutable fashion in a so-called master dataset and results are delivered in so-called batch views Serving layer makes batch views accessible to user queries by maintaining indexes User queries answered by merging data from batch views and real-time views Open source implementation following the ideas of a lambda architecture is Apache Druid http://druid.io/ (streaming data in real-time nodes and batch data in historical nodes) Dr. Lena Wiese NOSQL DBs 44 / 49
  • 50. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Polyglot Data Base Architectures :: Lambda Architecture Institut für Informatik Lambda Architecture Batch layer Batch view 1 Batch view 2 Batch view 3 Batch view 4 Master data set Speed layer Recent data set Speed view 1 Speed view 2 Speed view 3 Speed view 4 Serving layer Index 1 Index 2 Index 3 Index 4 Data stream append append merge Dr. Lena Wiese NOSQL DBs 45 / 49
  • 51. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Polyglot Data Base Architectures :: Multi-Model Databases Institut für Informatik Multi-Model Databases Use a database system that stores data in a single store but provides access to the data with di erent APIs according to di erent data models Either support di erent data models directly inside the database engine or o er layers for additional data models on top of a single-model engine OrientDB http://orientdb.com/ a document API, an object API, and a graph API (Java Graph API is compliant with Tinkerpop) extensions of the SQL standard to interact will all three APIs ArangoDB https://www.arangodb.com/ a graph API, a key-value API and a document API Query language AQL (ArangoDB query language) resembles SQL but adds several database-specific extensions to it Dr. Lena Wiese NOSQL DBs 46 / 49
  • 52. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Polyglot Data Base Architectures :: Multi-Model Databases Institut für Informatik Multi-Model Databases Graph layer key-value store graph traversal write-heavy transaction REST- based access Dr. Lena Wiese NOSQL DBs 47 / 49
  • 53. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Conclusion Institut für Informatik Overview 1 Introduction 2 Graph Databases 3 XML Databases 4 Key-Value Stores 5 Document Stores 6 Column Stores 7 BigTable Databases 8 Polyglot Data Base Architectures 9 Conclusion Dr. Lena Wiese NOSQL DBs 48 / 49
  • 54. { } Knowledge Engineering K9 Georg-August-Universität Göttingen Conclusion Institut für Informatik Conclusion Many, many other data models than just relational tables Lots of di erent query languages (no standards) Problems with reliability (no long-term experience, open source development teams) Which database you choose depends on your needs Dr. Lena Wiese NOSQL DBs 49 / 49