Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. ElasticSearchis a free and open source distributed inverted index. So it’s a bunch of indexed documents in a repository. As well as it’s fast, incisive search against large volumes of data. And directly accessed to the data in the denormaliz document storage. Additionally in general distributable and highly scalable DB.
3. Big Data
Is a data becomes difficult to process because of its size using traditional data
processing applications.
Require "massively parallel software running on tens, hundreds, or even thousands
of servers"
4. Horizontal Scaling vs Vertical Scaling
Scale Vertically
Add resources to a single node in a system
Enhance the server(more CPU, more RAM, etc).
High availability difficult to implement
Scale Horizontally
Add more nodes to a system.
more servers, distributing the load
High availability easy to implement
5. NOSQL && Why
A NoSQL database provides a mechanism for storage and retrieval of data that is
modeled in means other than the tabular relations used in relational databases.
Large volumes of structured, semi-structured, and unstructured data
created in response to the limitations of traditional relational database technology
Object-oriented programming that is easy to use and flexible
Efficient, scale-out architecture instead of expensive, monolithic architecture
The right data model: key-value, graph, wide column, or document models
6. Scalability: CAP Theorem
Availability : Remains Accessible and operations at all time.
Consistency: Commits are atomic across the entire distributed system(all nodes see
the same data at the same time).
Partition Tolerance: Only a total network failure can cause the system to respond
incorrectly
7. Elasticsearch
ElasticSearch is a free and open source distributed inverted index.
Fast, incisive search against large volumes of data.
Indexing documents to the repository
Denormalized document storage: Fast, direct access to your data
Broadly distributable and highly scalable.
8. Why Use ElasticSearch
Incredibly fast search performance.
Highly Scalable.
Real time index/search
REST Api with JSON
Denormalized data store.
Free schema
High availability
Written in java, build on top of apache lucene and open source
Replace document stores like MongoDB, RavenDBWhat is Big data
10. RESTful API
The Interact with elasticsearch bing through RESTful api. Using RESTful API JSON over
HTTP nearly any action can be performed.
Using http
Gets
Posts
Deletes
Posts
The Responses are always in JSON format.
11. Apache Lucene
Apache Lucene is a high performance, full-featured Information Retrieval library,
written in Java.
ElasticSearch uses Lucene internally to build its state of the art distributed search
and analytics capabilities.
The relationship between Elasticsearch and Lucene, is like that of the relationship
between a car and its engine.
12. Query DSL
The Query DSL is Elasticsearch's way of making Lucene's query syntax accessible
to users, allowing complex queries to be composed using a JSON syntax.
Like Lucene, there are basic queries such as term or prefix queries and also
compound queries like the bool query.
13. Multi-Tenancy
Multiple indexes can be stored on one Elasticsearch installation - node or cluster.
Each index can have multiple "types", which are essentially completely different
indexes.
The best thing ever is querying multiple types and multiple indexes with one
simple query at breath taking speed.
14. Schema free
ElasticSearch allows you to get started easily. Send a JSON document and it will
try to detect the data structure, index the data and make it searchable.
15. Document Oriented
Store complex real world entities in ElasticSearch as structured JSON documents.
All fields are indexed by default
Stores entire objects or documents
Indexes the contents of each document in order to make them searchable
24. RDBMS SQL Query vs Elasticsearch Query
SQL Elasticsearch
Database Index
Partion Shared
Table Type
Row Document
Column Field
Schema Mapping
Index Everything is indexed
SQL Query DSL