Presentation on how to chat with PDF using ChatGPT code interpreter
MongoDB NoSQL database a deep dive -MyWhitePaper
1. Topic:Topic: NoSQLNoSQL DatabaseDatabase –– MongoDBMongoDB
Presenter: Rajesh KumarPresenter: Rajesh Kumar
Sr. Data ArchitectSr. Data Architect --Big Data Analytics & Information ManagementBig Data Analytics & Information Management
Agenda:
• What is NoSQL ,Why NoSQL
• The different Types of NoSQL Databases & Data Model approach
• Detailed overview of one of the most popular NoSQL database–MongoDB
• Model- Document oriented database
• JSON
• CRUD Operation
• Model Data In MongoDB
• Data Model design consideration
• Indexing
• Sharding• Sharding
• Replication
• Use cases
• Reference Architecture
• Insurance Conceptual Data Model
2. Relational database has been so well but..Relational database has been so well but..
The relational Database has been excellent, But the world of data is rapidly changing.The
amount of data created each year is almost doubling , and it is kind of data explosion.And
these data are not simply transactional structured data.They are the new types of data-
generated from web log, documents, clickstream, devices, censors & other IoT;.
Traditional RDBMS systems are not designed to handle such volume , variety and velocity
of these (semi-structured & unstructured) data produced in such enormous quantity.
Traditional RDBMS can’t provide scalability, performance, and flexibility needed for modern
distributed data storage and processing .
5. What is NoSQLWhat is NoSQL -- Not Only SQL ?Not Only SQL ?
Non relational,
distributed,
schema free,
flexible,
horizontal scalable,
open-source
simple API
6. Why NoSQL ?Why NoSQL ?
Support for distributed platform in the age of Big data
Ability to effectively deal with all kinds of data format images, docs, streaming, text, web, geospatial,
sensor, machine , real time operational
Scalability and performance(low latency and faster data access )
Rapid scale - scale out as much as business need to support more user and growing data
24*7 data availability and global deployment
Data to support next gen high performance apps
Real time reporting and analytics (predictive analytics, Machine learning) support beyond their data
warehouses support
Lowers data management cost Lowers data management cost
7. Types of NoSQL DatabasesTypes of NoSQL Databases
Key/Value store – Memchased, DynamoDB,
Column Store – cassandra, Hbase
Document Store-MongoDB, CouchDB,DynamoDB
Graph Store- Neo4j
Multi-Model databases – DynamoDB,CouchDB
Mongo DB is document oriented database
Data structure is composed of key/value pair in JSON File format
8. What is MongoDBWhat is MongoDB ??
An Open source document oriented NoSQL database that provides high
performance, automatic scaling and flexible schema design.
9. MongoDB fulfills both traditional and new requirementMongoDB fulfills both traditional and new requirement
11. A quick recap of MongoDB CharacteristicsA quick recap of MongoDB Characteristics
Distributed document oriented NoSQL Database
MongoDB store data in JSON-Documents represented as BSON
Dynamic and flexible schema
Horizontal scaling, easy to scale
Support reach query language
Support CRUD for read and write operation
Support forText search and Geospatial queries
Efficient text and geospatial Index
Very strong sharding and replicationVery strong sharding and replication
_id : It’s a special key assign to each document
-id is unique across a collection
12. A record in MongoDB is a document, which is a data structure composed ofA record in MongoDB is a document, which is a data structure composed of
field(key)field(key) and value pairsand value pairs.The values of fields may include other nested.The values of fields may include other nested
documents, arrays, and arrays of documents.documents, arrays, and arrays of documents.
13. MongoDB Data ModelMongoDB Data Model
MongoDB store document in JSON(BSONActually)
JSON - short for JavaScript Object Notation
BSON is binary serialization of JSON objects
A JSON object is a key-value("key" : "value" )pair data format that is enclosed in curly braces { }
Document creation is free from schema- No structure, data type , size is required to be predefined.
You can create as many fields as you require dynamically.
Data type supported BY JSON/BSON in MongoDB –Strings, Numbers(integer, long, double), Objects,
Arrays, Boolean(true/false),Null, Date,Timestamp.
Other construct in MongoDB are Databases, collections, documents, fields
14. Mongo DB Data model core conceptsMongo DB Data model core concepts
Databases-In MongoDB databases is physical container of collection that holds collection of
documents.
Collection- Collection is a container of documents, document can be anything.
Document- document is a group of fields in Key/Value pair and free from schema, table, column; a
document can hold any type of data.
Think of Collection and Documents as table & rows in RDBMS
Hierarchical
A document can reference other document
A document can contain other embedded document, array, arrays of document
16. Mongo DB DataMongo DB Data ModelModel-- A Document StoreA Document Store ModelModel
Not PDF , Word, CSV or HTML,Not PDF , Word, CSV or HTML,
DocumentsDocuments are nested structures created using JavaScript Objectare nested structures created using JavaScript Object Notation(JSON).Notation(JSON).TThink of document ashink of document as
a records ina records in below example,below example, lets see howlets see how a document looka document look like in MongoDBlike in MongoDB
18. MongoDB system componentMongoDB system component
COMPONENTS
mongod -The database process.
mongo -The database shell (uses interactive javascript).The command line shell for interacting directly
with database.
mongos - Sharding router
UTILITIES UTILITIES
mongostat - Show performance statistics
mongofiles - Utility for putting and getting files from MongoDB GridFS
mongoimport - Import into mongo from JSON or CSV
mongoexport - Export a single collection (JSON, CSV)
19. Basic Mongo Shell commandsBasic Mongo Shell commands
MongoDB stores documents in collections. If a collection does not exist, MongoDB creates the collection
when you first store data for that collection.
Select/create Database : use customerdb
>db tells you the current database
List databases:
>show dbs
local 0.78125GB
test 0.23012GB
customerdb
myDBmyDB
Create collection:
db.createCollection(“products")
List collections,already created
>Show collections
21. DData manipulation frequently used methodsata manipulation frequently used methods
The createCollection() Method
db.createCollection(name, options)
The drop() Method
MongoDB's db.collection.drop() is used to drop a collection from the database.
Rename Collection:
>db.collection.renameCollection(“NewColName”)
>db.cusstomer.renameCollection(“Customer”)
The Insert Method ()
>db.COLLECTION_NAME.insert(document)
Query document using find method-
>db.COLLECTION_NAME.find()
Update() Method Update() Method
>db.COLLECTION_NAME.update(SELECTION_CRITERIA, UPDATED_DATA)
>db.col.update({“title”:”MongoDB '},{$set:{“title”: “MongoDB Definitive Guide”}})
The remove() Method
>db.col.remove({“title “ :”MongoDB”})
The sort() Method
>db.COLLECTION_NAME.find().sort({KEY:1})
sorting order 1 and -1 are used. 1 is used for ascending order while -1 is used for descending order.
22. Basic DB operations in a complex documentBasic DB operations in a complex document
Find operation
Querying in embedded object
Comparison operators
Querying in arrays of document
Indexing on embedded document
Indexing on multiple key
24. Example Schema.Example Schema.
Model Data in MongoDB: Model your data the way it is used.Model Data in MongoDB: Model your data the way it is used.
26. Some schema design considerationsSome schema design considerations
What is priority
High consistency
High read performance
High write performance
ODS application
Real time
How does the application access and manipulate data
Data access path and types of queries
Read versus write ratio Read versus write ratio
Analytics( aggregation, video, images, machine, geospatial data)
27. IndexesIndexes--Indexes are special data structure that store subset of your data in an efficientIndexes are special data structure that store subset of your data in an efficient
way for easy & faster access to the dataway for easy & faster access to the data
MongoDB store Index in a b-tree format which allows efficient traversal to the index content
Proper Index selection is important in MongoDB and makes DB run optimally, improper Indexing
may bring system to a lot of issues in read-write operations and data distribution across sharded
cluster)
IndexesTypes:
-id
Simple
Compound
Multi key
FullText FullText
Geo-spatial
Hashed
28. Index continued..Index continued..
The –id index : It is automatically created, immutable and can’t be removed.
This is same as primary key in RDBMS.
Default value is a 12 byte Object ID
4-Byte timestamp, 3-byte machine id, 2-byte process id,3-byte counter
Simple Index: A simple Index is an Index on a single key
Compound Index:A compound Index is created over two or more fields in a document
Multi-key Index:A multi-key Index is an Index created on a field that contains an array
Full-text search Index:This is an Index over a text based field, similar to how google indexes web
pages. e.g Find all tweets that mention auto insurance within 30 days. Search Big Data in a blogpost
or all the tweets in last 30 days.
Geo-spatial Index: This Index is to support efficient queries of geospatial coordinate data .It is Geo-spatial Index: This Index is to support efficient queries of geospatial coordinate data .It is
used when you need to query location based spatial data.This Index is really a great feature
because location based data is one of the valuable data being collected today for targeted location
based customer, location based product analysis . e.g Find all customers that live within 50 miles of
NY.
Hashed Index: Used mainly in Hash based sharding, and allows for more randomized data
distribution across shards
Create Index syntax:
db.employee.ensureIndex({“email”:1},{“unique”:true})
db.employee.ensureIndex({“age”;1}, {“sparse”: true})
db.employee.find({age: {$gte :25}})
29. Index Continue..Index Continue..
Index Properties:
TTL Index-TTL indexes are special indexes that MongoDB can use to automatically remove documents from a
collection after a certain amount of time
Sparse Index-The sparse property of an index ensures that the index only contain entries for documents that have the
indexed field.The index skips documents that do not have the indexed field.
Unique Index- To enable the uniqueness of the field.
Text Search Index:
MongoDB provides text indexes to support text search queries on text content.To perform text search queries, you
must have a text index on your collection.A collection can only have one text search index, but that index can cover
multiple fields.
Creating text search Index over the ”title” and “content” fields :
db.blogpost.ensureIndex( { title: "text", content: "text" } )db.blogpost.ensureIndex( { title: "text", content: "text" } )
Use the $text query operator to perform text searches
on a collection with a text index.
$text perform a logical OR of all such on the intended search string.
For example, we can use the following query to find term MongoDB and BigData in the blogpost.
db.blogpost.find( { $text: { $search:“MongoDB" } } )
db.blogpost.find({$text:{$search:”BigData”}})
DeletingText Index: To delete an existing text index, first find the name of index using the following query,
to get the name of the index >db.blogpost.getIndexes()
Now you can drop the text Index: >db.blogpost.dropIndex(“title_text_content_text")
30. TTextext indexesindexes to support text searchto support text search analyticsanalytics--By exampleBy example
31. Mongo DBMongo DB ShardingSharding
Sharding is a method for storing data across multiple machines in clustered computing
environment. MongoDB uses sharding to support deployments with very large data
sets and high throughput operations.
Purpose of Sharding
When Database system grows very large, capacity of the single server machine can be
challenged in increased work load and high concurrent user that demands high throughput .
After a certain level ,you can’t keep doing vertical scaling by adding more CPU,RAM and
storage, vertical scaling has limitations.
In contrast, Sharding works on Horizontal scaling; divides the data sets and distribute the data
over the multiple shards servers. Each shards work as an independent database and
collectively all the shards make up a single logical database unit.collectively all the shards make up a single logical database unit.
Sharding reduces the amount of data that each server needs to store.When data grows you
can add more shards in the cluster and subsequently each shard stores less data as the cluster
grows.
For example, if a database has a 1 terabyte data set, and there are 4 shards, then each shard
might hold only 256GB of data. If there are 40 shards, then each shard might hold only 25GB
of data
32. Shards in Mongo DB ArchitectureShards in Mongo DB Architecture
34. Replication Continue..Replication Continue..
Replica set members
A replica set in MongoDB is a group of mongod processes that provide redundancy
and high availability. The members of a replica set are:
Primary- It receives all write operations and records the operation in primary oplog.
Secondary – Secondary member replicate operations from the primary to maintain
an identical copy of data set to recover from failure.
Note :The minimum recommended configuration for a replica set is: A primary, a
secondary, and an arbiter. Most deployments, will keep three members that store
data: A primary and two secondary members
35. UseUse casescases--Type of workload suitable with NoSQLType of workload suitable with NoSQL
Mobile app development
Internet of things
Digital advertisement
Streaming application
Web application
Social applications
Gaming
Content management
Customer personalization
Recommendation engine
360 customer view of customer,
business, product
Fraud detection
Real time analytics Gaming Real time analytics
36. MongoDB supports for programming languagesMongoDB supports for programming languages
37.
38. Other cool stuffOther cool stuff
Sharding
Aggregation and map/reduce
Storage engine-Wired Tiger
Capped collection
GridFS
Text and GeoSpatial Index
Use of python, Java Scripting language for complex data handling Use of python, Java Scripting language for complex data handling
Replication
39. That’s it
Thank you !
Email me:Rajesh-29.kumar-29@cognizant.com
Follow me on Twitter: @rajesh14k