1. MongoDB use cases and setup
involving Elasticsearch
MongoDB Meetup @hikeapp Gurgaon
Bharvi Dixit
@d_bharvi
13th February 2015
2. Agenda
About Me and Orkash.
Why we chose MongoDB.
Our use cases and setup of MongoDB.
Better Than Apple: MongoDB-Elasticsearch.
Elasticsearch An Overview.
The most common issues.
Mongo University: Learn from the masters.
3. About Me
Software Engineer @Orkash.
Organizer and Speaker @Delhi Elasticsearch Meetup.
Loves Java, Data, Elasticsearch, MongoDB, Eclipse.
Interested in all things scale, search, security & DevOps.
Working with NoSQL databases for more than a year.
Social Media and News Media Intelligence. (Complex
schemas & Query designs)
4. About Orkash
Founded in 2007 by Ashish Sonal.
An R&D driven company which provides Big Data Automated Intelligence
Platform with a focus in following areas:
– Counter-terrorism, Security intelligence and Risk management.
– Political Consulting And Homeland Security.
– Decision Support Systems.
– Market/Brand intelligence.
We create the FOUR pillars of Automated intelligence:
– Information Extraction and Monitoring.
– Semantic and Link Analysis.
– Geo-Spatial Analysis.
– Data Mining & Forensics.
5. Everything starts with a problem..!!
• Data Driven Decisions
• Logfiles for scaling up/down
• Warehouse withdrawal triggers orders
• History for fraud detection
• Internet of Things and Smart Cities.
... data explosion
6. Everything starts with a problem..!!
Better decisions == more data
And NoSQL adds more problems
Data
Big Data
BIG DATA
7. Big Data Problem goes on..
• I need BIG DATA.
• I need to analyze this data.
• I need to enrich this big data & make it more bigger.
• I need fast searching.
• I need real-time analytics.
• Ohh wait.. I need relational queries on this big data to get
more insights..
8. Why we chose mongoDB
• It does the impossible. (Can incorporate any kind of data)
• Document model.
• Distributed computing.
• Awesome sharding and replications.
• Scales big (horizontally) on commodity hardware's.
• Powerful Analytics with aggregation framework.
• Highly Persistence and Read-Write Performance.
• Awesome security features.
• OS-Managed memory management.
9. Our use cases and setup of MongoDB.
• A primary data store for collecting and storing humongous
amount of unstructured/semi-structured texts.
• Building GIS applications for government and security agencies
using GEO Spatial features.
• Data analytics.
10. Our use cases and setup of MongoDB.
Our current production setup has 14 nodes:
Node Type #of nodes Hardware Specifications
Data nodes 5 (20 GB RAM with 8 core CPU each)
Mongos (VM’s) 4 (4 GB RAM with 4 core CPU each)
Arbiter nodes(VM’s) 2 (1 GB RAM with 1 core CPU each)
Config servers(VM’s) 3 (4 GB RAM with 2 core CPU each)
11. Better Than Apple: MongoDB-Elasticsearch
• One of the greatest
combinations this era has
seen.
• Continuous improvements
• Fulfills each other’s
missing features.
• Both have almost similar
concepts and data types.
• Both keep cloud in mind.
• Driven by Open-Source
community, knowledge
sharing, and High
collaboration with users.
13. Elasticsearch Overview
What is Elasticsearch:
• “you know, for search”
• Schema-free, REST & JSON Based distributed Full Text
search engine & document store.
• Written in JAVA & Build on top of Lucene.
• Highly reliable, scalable, fault tolerant.
• Support distributed Indexing, Replication, and load
balanced querying.
• Powerful Geo-Spatial Queries.
• Latest Release : 1.4.2
Wait..!! Schema Free?? The real gotcha.. Mongo-ES breakup
14. Elasticsearch Overview
What does it add to Lucene:
• REST service: Json API’s over HTTP
• High Availability & Performance: Clustering & Replication
• A Powerful query DSL.
• Interoperation with non-Java/JVM languages.
• More and more Resilience.
• Multitenancy
• And the best one: It allows to maintain relationship
among documents.
17. The most common issues..
1. Distributed computing comes with two problems:
Node failures and Network Bottlenecks
Node failures can be handled by MongoDB very easily but
Network bottleneck/partitions won’t let you sleep at nights
because of Replicaset failovers and Rollbacks.
Separate networks for read and write.
2. Assuring Business continuity plan
Mongodump is not fit for the large dataset backups.
3. Data Modeling
4. Keeping a close eye on Connection
5. Importing embedded documents in CSV
19. Thank You for Listening
bharvidixit@yahoo.com
https://twitter.com/d_bharvi
http://www.meetup.com/Delhi-Elasticsearch-Meetup/
http://www.slideshare.net/bharvidixit/