This document summarizes a student presentation on NoSQL databases given at the Web Data Management, Search and Retrieval course at Birzeit University in Palestine. The presentation introduces document-oriented databases and compares their structure and querying to relational databases. It also covers concepts like joins, constraints, transactions, embedding documents, and indexing arrays. Benefits of NoSQL databases discussed include scalability, performance, and flexibility in schema design. Specific examples are provided for one-to-many and many-to-many relationships using embedding and references. The presentation concludes with sections on data replication and sharding in MongoDB.
1. Web Data Management, Search and Retrieval Course (MCOM7348)
University of Birzeit, Palestine
January, 2014
Synthesis Paper Talk
Nosql Database
Fayez shayeb
bbbbb
Master of Computing, Birzeit University
Fayez.aauj@hotmail.com
Master of Computing, Birzeit University
bbbbbbb@yahoo.com
This is a student talk, at the Web Data Management, Search and Retrieval
Course, each student is is asked to present his/her synthesis paper.
Course Page: http://jarrar-courses.blogspot.com/2013/11/web-data-management.html
1
3. Comparison
Document db
Relational
Article
- id
- authorid
- title
- content
Author
- id
- name
- email
Comment
- id
- articleid
- message
Article
- _id
- title
- content
- author
- _id
- name
- email
- comments[]
4. Concepts
Joins
No joins
Joins at "design time", not at "query time“
Due to embedded docs and arrays less joins are needed
Constraints
No foreign key constraints
Unique indexes
Transactions
No commit/rollback
Atomic operations
Multiple actions inside the same document
Incl. embedded documents
5. Benefits
Scalable: good for a lot of data / traffic
Horizontal scaling: to more nodes
Good for web-apps
Performance
No joins and constraints
Dev/user friendly
Data is modeled to how the app is going to use it
No conversion between object oriented > relational
No static schema = agile
6. One-to-many
Embedded array / array keys
Some queries get harder
You can index arrays!
Normalized approach
More flexibility
A lot less performance
Article
- _id
- content
- tags: {“foo”, “bar”}
- comments: {“id1”, “id2”}
7. many-to-many
Using array keys
No join table
References on both sides
Article
- _id
- content
- category_ids : {“id1”, “id2”}
Category
- _id
- name
- article_ids: {“id7”, “id8”}
Advantage: simple queries
articles.Where(p => p.CategoryIds.Contains(categoryId))
categories.Where(c => c.ArticleIds.Contains(articleId))
Disadvantage: duplication, update two docs
8. json
•
•
•
Stands for Javascript Object Notation
Derived from Javascript scripting language
Used for representing simple data
structures and associative arrays
{
{
"_id": "BCCD12CBB",
"type": "person",
"name": "Darth Vader",
"age": 63,
"headware":
["Helmet", "Sombrero"],
"dark_side": true
}
"_id": "BCCD12CBC",
"type": "person",
"name": "Luke",
"age": 35,
"powers":
["Pull", "Jedi Mind Trick"],
"dark_side": false
}
9. Replication
Only one server is active for writes (the primary, or
master) at a given time – this is to allow strong
consistent (atomic) operations. One can optionally
send read operations to the secondaries when
eventual consistency semantics are acceptable.
10. Sharding
Sharding is the partitioning of data among multiple
machines in an order-preserving manner.(horizontal
scaling )
Machine 1
Machine 2
Machine 3
Alabama → Arizona
Colorado → Florida
Arkansas → California
Indiana → Kansas
Idaho → Illinois
Georgia → Hawaii
Maryland → Michigan
Kentucky → Maine
Minnesota → Missouri
Montana → Montana
Nebraska → New Jersey
Ohio → Pennsylvania
New Mexico → North Dakota
Rhode Island → South Dakota
Tennessee → Utah
Vermont → West Virgina
Wisconsin → Wyoming
11. Sharding
The set of servers/mongod process within the shard
comprise a replica set
12. Replication & Sharding conclusion
sharding is the tool for scaling a system, and
replication is the tool for data safety, high availability,
and disaster recovery. The two work in tandem yet are
orthogonal concepts in the design.
14. What is MongoDB?
Creator: 10 gen, former doublick
Name: short for humongous (
Language: C++
)
15. What is MongoDB?
Defination: MongoDB is an open source, document-
oriented database designed with both scalability and
developer agility in mind. Instead of storing your data
in tables and rows as you would with a relational
database, in MongoDB you store JSON-like
documents with dynamic schemas(schemafree, schemaless).
16. What is MongoDB?
Goal: bridge the gap between key-value stores (which
are fast and scalable) and relational databases (which
have rich functionality).
19. MongoDB Data model
Data model: Using BSON (binary JSON), developers
can easily map to modern object-oriented languages
without a complicated ORM layer.
BSON is a binary format in which zero or more
key/value pairs are stored as a single entity.
lightweight, traversable, efficient