SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Wanna search? Piece of cake!
Fast, scalable and easy to setup search engine for your
data.
by Alexey Kursov
http://www.linkedin.com/in/kursov
ElasticSearch is a
● distributed
● RESTful
● free/open source search server
● based on Apache Lucene.
It is developed by Shay Banon(@kimchy) and is released
under the terms of the Apache License. ElasticSearch is
developed in Java.
http://elasticsearch.org/
http://elasticsearch.com/
WTF?
Apache Lucene is a
● free/open source information retrieval software library
● originally created in Java
● it is supported by the Apache Software Foundation
● it is released under the Apache Software License
While suitable for any application which requires full text indexing and
searching capability, Lucene has been widely recognized for its utility in the
implementation of Internet search engines and local, single-site searching.
http://lucene.apache.org/core/
Lucene?
Indexing.
ElasticSearch is able to achieve fast search responses because,
instead of searching the text directly, it searches an index instead.
This type of index is called an
inverted index, because it inverts
a page-centric data structure
(page->words) to a keyword-centric
data structure (word->pages).
ElasticSearch uses Apache Lucene
to create and manage this inverted
index.
Basic Concepts
In computer science, an inverted index is an index data structure storing a mapping
from content, such as words or numbers, to its locations in a database file, or in a
document or a set of documents. The purpose of an inverted index is to allow fast full
text searches, at a cost of increased processing when a document is added to the
database.
Simple example:
Given the texts:
T[0] = "it is what it is"
T[1] = "what is it"
T[2] = "it is a banana"
we have the following inverted file index (where the integers in the set notation brackets refer to
the indexes (or keys) of the text symbols, T[0], T[1] etc.):
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
Inverted index
Basic Concepts
Data representation.
In ElasticSearch, a Document is the unit of search and index. An index
consists of one or more Documents, and a Document consists of one or more
Fields (in database terminology, a Document corresponds to a table row, and a
Field corresponds to a table column).
Schema declares:
- what fields there are
- which field should be used as the unique/primary key
- which fields are required
- how to index and search each field
- etc.
An index may store documents of
different "mapping types".
You can associate multiple
mapping definitions for each mapping type.
A mapping type is a way of separating the
documents in an index into logical groups.
Competitors?
http://lucene.apache.org/solr/
http://sphinxsearch.com/
What's the same?
VS
Lucene Query, Facet, Index functionality
implementation:
Very similar, but have some differences and nuances, as the one or the other
side (in the internet a lot of information about this, you can read for example
this series of articles http://blog.sematext.com/2012/08/23/solr-vs-
elasticsearch-part-1-overview/ )
What's the difference?
VS
ElasticSearch main advantages (IMHO):
1. Low barriers to entry. ElasticSearch is a more "intuitive, accessible" system
(significantly less configuration, as it's dynamic via HTTP schema builder and
sensible defaults)
2. JSON-based API is cleaner and easier to use
3. The replication and sharding capabilities are much simpler to configure
4. Complex documents (nested)
5. Multiple document types per schema
6. Joins (parent/child relationships)
7. Online schema changes
8. Self-contained cluster
What's the difference?
VS
Solr main advantages (IMHO):
1. Solr has a bigger, more mature user, dev, and contributor community
2. Solr is more mature and maybe more stable
3. Solr has more response formats (XML,CSV,JSON)
4. Better 3rd-party product integration
5. Pivot Facets
6. More customizable
Who wins?
VS
We are all!
ES Clients and "river" plugins
There are clients for languages and platforms (from official site):
Java, .Net, Perl, Python, Python, Ruby, PHP, Javascript, Scala, Clojure, Go,
Erlang, EventMachine, OCaml, Smalltalk
There are "river" (data import) plugins for:
JDBC, CouchDB, Wikipedia, Twitter, RabbitMQ, RSS, MongoDB, Open
Archives Initiative (OAI) , St9, Sofa, Amazon SQS, LDAP, Dropbox, ActiveMQ,
Solr, CSV, JMS
Who use ?
How to connect from my code?
NEST
(Guys from stackowerflow.com and I think it is the best .net client for ElasticSearch)
NEST aims to be a .net client with a very concise API. (http://github.com/Mpdreamz/NEST)
Its main goal is to provide a solid strongly typed Elasticsearch client. It also has string/dynamic
overloads for more dynamic use cases.
Why NEST?
● Fluent. Looks like:
ElasticClient.Search<Foo>(s => s.From(0).Size(10).SortAscending(f => f.Name).Query(...
● Json serializer/deserializer - Newtonsoft Json.NET with all its advantages
● Strongly typed
● Useful attributes for configuring
● kept improving and developing
● Open-source
● Clear and beauty source code
● Available on NuGet
Practice

Contenu connexe

Tendances

Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Netgramana
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerSean Golliher
 
CenitHub Presentations | 2- Libraries, Schemas & Data Types
CenitHub Presentations | 2- Libraries, Schemas & Data TypesCenitHub Presentations | 2- Libraries, Schemas & Data Types
CenitHub Presentations | 2- Libraries, Schemas & Data TypesMiguel Sancho
 
Url web design
Url web designUrl web design
Url web designCojo34
 
Full text search
Full text searchFull text search
Full text searchdeleteman
 
Anno4j - Idiomatic Persistence and Querying for the W3C Annotation Data Model
Anno4j - Idiomatic Persistence and Querying for the W3C Annotation Data ModelAnno4j - Idiomatic Persistence and Querying for the W3C Annotation Data Model
Anno4j - Idiomatic Persistence and Querying for the W3C Annotation Data ModelEmanuel Berndl
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAHCARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAHRelawan Jurnal Indonesia
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaPaolo Mottadelli
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 

Tendances (20)

Apache Lucene Basics
Apache Lucene BasicsApache Lucene Basics
Apache Lucene Basics
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
Lucece Indexing
Lucece IndexingLucece Indexing
Lucece Indexing
 
Xml namespace
Xml namespaceXml namespace
Xml namespace
 
Azure search
Azure searchAzure search
Azure search
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
 
CenitHub Presentations | 2- Libraries, Schemas & Data Types
CenitHub Presentations | 2- Libraries, Schemas & Data TypesCenitHub Presentations | 2- Libraries, Schemas & Data Types
CenitHub Presentations | 2- Libraries, Schemas & Data Types
 
Url web design
Url web designUrl web design
Url web design
 
Full text search
Full text searchFull text search
Full text search
 
Anno4j - Idiomatic Persistence and Querying for the W3C Annotation Data Model
Anno4j - Idiomatic Persistence and Querying for the W3C Annotation Data ModelAnno4j - Idiomatic Persistence and Querying for the W3C Annotation Data Model
Anno4j - Idiomatic Persistence and Querying for the W3C Annotation Data Model
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
ProjectHub
ProjectHubProjectHub
ProjectHub
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAHCARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache Tika
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
INTEGRASI ORCID DENGAN CROSSREF
INTEGRASI ORCID DENGAN CROSSREFINTEGRASI ORCID DENGAN CROSSREF
INTEGRASI ORCID DENGAN CROSSREF
 

Similaire à Wanna search? Piece of cake!

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with railsTom Z Zeng
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solrmacrochen
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxKnoldus Inc.
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearchJoey Wen
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting StartedOnuralp Taner
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsTiziano Fagni
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 

Similaire à Wanna search? Piece of cake! (20)

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptx
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
ElasticSearch Basics
ElasticSearch Basics ElasticSearch Basics
ElasticSearch Basics
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elastic search
Elastic searchElastic search
Elastic search
 
ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting Started
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Wanna search? Piece of cake!

  • 1.
  • 2. Wanna search? Piece of cake! Fast, scalable and easy to setup search engine for your data. by Alexey Kursov http://www.linkedin.com/in/kursov
  • 3. ElasticSearch is a ● distributed ● RESTful ● free/open source search server ● based on Apache Lucene. It is developed by Shay Banon(@kimchy) and is released under the terms of the Apache License. ElasticSearch is developed in Java. http://elasticsearch.org/ http://elasticsearch.com/ WTF?
  • 4. Apache Lucene is a ● free/open source information retrieval software library ● originally created in Java ● it is supported by the Apache Software Foundation ● it is released under the Apache Software License While suitable for any application which requires full text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of Internet search engines and local, single-site searching. http://lucene.apache.org/core/ Lucene?
  • 5. Indexing. ElasticSearch is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages). ElasticSearch uses Apache Lucene to create and manage this inverted index. Basic Concepts
  • 6. In computer science, an inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database. Simple example: Given the texts: T[0] = "it is what it is" T[1] = "what is it" T[2] = "it is a banana" we have the following inverted file index (where the integers in the set notation brackets refer to the indexes (or keys) of the text symbols, T[0], T[1] etc.): "a": {2} "banana": {2} "is": {0, 1, 2} "it": {0, 1, 2} "what": {0, 1} Inverted index
  • 7. Basic Concepts Data representation. In ElasticSearch, a Document is the unit of search and index. An index consists of one or more Documents, and a Document consists of one or more Fields (in database terminology, a Document corresponds to a table row, and a Field corresponds to a table column). Schema declares: - what fields there are - which field should be used as the unique/primary key - which fields are required - how to index and search each field - etc. An index may store documents of different "mapping types". You can associate multiple mapping definitions for each mapping type. A mapping type is a way of separating the documents in an index into logical groups.
  • 9. What's the same? VS Lucene Query, Facet, Index functionality implementation: Very similar, but have some differences and nuances, as the one or the other side (in the internet a lot of information about this, you can read for example this series of articles http://blog.sematext.com/2012/08/23/solr-vs- elasticsearch-part-1-overview/ )
  • 10. What's the difference? VS ElasticSearch main advantages (IMHO): 1. Low barriers to entry. ElasticSearch is a more "intuitive, accessible" system (significantly less configuration, as it's dynamic via HTTP schema builder and sensible defaults) 2. JSON-based API is cleaner and easier to use 3. The replication and sharding capabilities are much simpler to configure 4. Complex documents (nested) 5. Multiple document types per schema 6. Joins (parent/child relationships) 7. Online schema changes 8. Self-contained cluster
  • 11. What's the difference? VS Solr main advantages (IMHO): 1. Solr has a bigger, more mature user, dev, and contributor community 2. Solr is more mature and maybe more stable 3. Solr has more response formats (XML,CSV,JSON) 4. Better 3rd-party product integration 5. Pivot Facets 6. More customizable
  • 13. ES Clients and "river" plugins There are clients for languages and platforms (from official site): Java, .Net, Perl, Python, Python, Ruby, PHP, Javascript, Scala, Clojure, Go, Erlang, EventMachine, OCaml, Smalltalk There are "river" (data import) plugins for: JDBC, CouchDB, Wikipedia, Twitter, RabbitMQ, RSS, MongoDB, Open Archives Initiative (OAI) , St9, Sofa, Amazon SQS, LDAP, Dropbox, ActiveMQ, Solr, CSV, JMS
  • 15. How to connect from my code? NEST (Guys from stackowerflow.com and I think it is the best .net client for ElasticSearch) NEST aims to be a .net client with a very concise API. (http://github.com/Mpdreamz/NEST) Its main goal is to provide a solid strongly typed Elasticsearch client. It also has string/dynamic overloads for more dynamic use cases. Why NEST? ● Fluent. Looks like: ElasticClient.Search<Foo>(s => s.From(0).Size(10).SortAscending(f => f.Name).Query(... ● Json serializer/deserializer - Newtonsoft Json.NET with all its advantages ● Strongly typed ● Useful attributes for configuring ● kept improving and developing ● Open-source ● Clear and beauty source code ● Available on NuGet