SlideShare une entreprise Scribd logo
1  sur  40
INTRODUCTION TO
ELASTICSEARCH
Agenda
• Me
• ElasticSearch Basics
• Concepts
• Network / Discovery
• Data Structure
• Inverted Index
• The REST API
• Bulk API
• Percolator
• Java Integration
• Stuff I didn’t cover
2
Me
3
• Roy Russo
• Former JBoss Portal Co-Founder
• LoopFuse Co-Founder
• ElasticHQ Founder
• http://www.elastichq.org
The Basics
• Document - Oriented Search Engine
• JSON, Lucene
• No Schema
• Mapping Types
• Horizontal Scale, Distributed
• REST API
• Vibrant Ecosystem
• Apps, Plugins, Hosting
4
The Basics - Distro
• Download and Run
5
├── bin
│ ├── elasticsearch
│ ├── elasticsearch.in.sh
│ └── plugin
├── config
│ ├── elasticsearch.yml
│ └── logging.yml
├── data
│ └── cluster1
├── lib
│ ├── elasticsearch-x.y.z.jar
│ ├── ...
│ └──
└── logs
├── elasticsearch.log
└── elasticsearch_index_search_slowlog.log
└── elasticsearch_index_indexing_slowlog.log
Executables
Log files
Node Configs
Data Storage
The Basics - Glossary
• Node = One ElasticSearch instance (1 java proc)
• Cluster = 1..N Nodes w/ same Cluster Name
• Index = Similar to a DB
• Named Collection of Documents
• Maps to 1..N Primary shards && 0..N Replica shards
• Type = Similar to a DB Table
• Document Definition
• Shard = One Lucene instance
• Distributed across all nodes in the cluster.
6
The Basics - Document Structure
• Modeled as a JSON object
7
{
"genre": "Crime",
“language": "English",
"country": "USA",
"runtime": 170,
"title": "Scarface",
"year": 1983
}
{
"_index": "imdb",
"_type": "movie",
"_id": "u17o8zy9RcKg6SjQZqQ4Ow",
"_version": 1,
"exists": true,
"_source": {
"genre": "Crime",
"language": "English",
"country": "USA",
"runtime": 170,
"title": "Scarface",
"year": 1983
}
}
The Basics - Document Structure
• Document Metadata fields
• _id
• _type : mapping type
• _source : enabled/disabled
• _timestamp
• _ttl
• _size : size of uncompressed _source
• _version
8
The Basics - Document Structure
• Mapping:
• ES will auto-map fields
• You can specify mapping, if needed
• Data Types:
• String
• Analyzers, Tokenizers, Filters
• Number
• Int, long, float, double, short, byte
• Boolean
• Datetime
• formatted
• geo_point, geo_shape, ip
• Attachment (requires plugin)
9
Lucene – Inverted Index
• Which presidential speeches contain the words “fair”
• Go over every speech, word by word, and mark the speeches that
contain it
• Linear to number of words
• Fails at large scale
10
Lucene – Inverted Index
• Inverting Obama
• Take all the speeches
• Break them down by word (tokenize)
• For each word, store the IDs of the speeches
• Sort all words (tokens)
• Searching
• Finding the word is fast
• Iterate over document IDs that are referenced
11
Token Doc Frequency Doc IDs
Jobs 2 4,8
Fair 5 1,2,4,8,42
Bush 300 1,2,3,4,5,6, …
Lucene – Inverted Index
• Not an algorithm
• Implementations vary
12
Cluster Topology
• 4 Node Cluster
• Index Configuration:
• “A”: 2 Shards, 1 Replica
• “B”: 3 Shards, 1 Replica
13
A1 A2
B2 B2 B1
B3
B1 A1 A2
B3
The Basics - Shards
• Paths…
• Primary Shard:
• First time Indexing
• Index has 1..N primary shards (default: 5)
• # Not changeable once index created
• Replica Shard:
• Copy of the primary shard
• Can be changed later
• Each primary has 0..N replicas
• HA:
• Promoted to primary if primary fails
• Get/Search handled by primary||replica
14
The Basics - Shards
• Shard Stages
• UNASSIGNED
• INITIALIZING
• STARTED
• RELOCATING
• Viewed in Cluster State
• Routing table : from indices perspective
• Routing nodes
15
The Basics - Searching
• How it works:
• Search request hits a node
• Node broadcasts to every shard in the index (primary & replica)
• Each shard performs query
• Each shard returns results
• Results merged, sorted, and returned to client.
• Problems:
• ES has no idea where your document is
• Randomly distributed around cluster
• Broadcast query to 100 nodes
• Performance degrades
16
The Basics - Shards
• Shard Allocation Awareness
• cluster.routing.allocation.awareness.attributes: rack_id
• Example:
• 2 Nodes with node.rack_id=rack_one
• Create Index 5 shards / 1 replica (10 shards)
• Add 2 Nodes with node.rack_id=rack_two
• Shards RELOCATE to even distribution
• Primary & Replica will NOT be on the same rack_id value.
• Shard Allocation Filtering
• node.tag=val1
• index.routing.allocation.include.tag:val1,val2
17
curl -XPUT localhost:9200/newIndex/_settings -d '{
"index.routing.allocation.include.tag" : "val1,val2"
}'
The Basics - Routing
18
curl -XPOST localhost:9200/crunchbase/person/1?routing=xerox -d '{
...
}'
curl -XPOST localhost:9200/crunchbase/_search?routing=xerox -d '{
"query" : {
"filtered" : {
"query" : { ... },}
}
}'
Routing can be used to constrain the shards being
searched on, otherwise, all shards will be hit
Discovery
• Nodes discover each other using multicast.
• Unicast is an option
• Each cluster has an elected master node
• Beware of split-brain
• discovery.zen.minimum_master_nodes
• N/2+1, where N>2
• N: # of master nodes in cluster
19
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3"]
Nodes
• Master node handles cluster-wide (Meta-API) events:
• Node participation
• New indices create/delete
• Re-Allocation of shards
• Data Nodes
• Indexing / Searching operations
• Client Nodes
• REST calls
• Light-weight load balancers
• Beware of Heap Size
• ES_HEAP_SIZE: ~1/2 machine memory
20
Cluster State
• Cluster State
• Node Membership
• Indices Settings and Mappings (Types)
• Shard Allocation Table
• Shard State
• cURL -XGET http://localhost:9200/_cluster/state?pretty=1'
21
Cluster State
• Changes in State published from Master to other nodes
22
PUT /newIndex
2 3
1 (M)
CS1 CS1 CS1
2 3
1 (M)
CS2 CS1 CS1
2 3
1 (M)
CS2 CS2 CS2
REST API
23
• Building IMDB
• Two Indexes
REST API
• Create Index
• action.auto_create_index: 0
• Index Document
• Dynamic type mapping
• Versioning
• ID specification
• Parent / Child (/1122?parent=1111)
• Explicit Refresh (?refresh=1)
• Timeout flag (?timeout=5m)
24
REST API – Versioning
• Every document is Versioned
• Version assigned on creation
• Version number can be assigned
• Re-Index, Update, and Delete update Version
25
REST API - Update
• Update using partial data
• Partial doc merged with existing
• Fails if document doesn’t exist
• “Upsert” data used to create a doc, if doesn’t exist
26
{
“upsert" : {
“title": “Blade Runner”
}
}
REST API
• Exists
• No overhead in loading
• Status Code Result
• Delete
• Get
• Multi-Get
27
{
"docs" : [
{
"_id" : "1"
"_index" : "imdb"
"_type" : "movie"
},
{
"_id" : "5"
"_index" : "oldmovies"
"_type" : "movie"
"_fields" " ["title", "genre"]
}
]
}
REST API - Search
• Free Text Search
• URL Request
• http://localhost:9200/imdb/movie/_search?q=scar*
• Complex Query
• http://localhost:9200/imdb/movie/_search?q=scarface+OR
+star
• http://localhost:9200/imdb/movie/_search?q=(scarface+O
R+star)+AND+year:[1981+TO+1984]
• Term, Boolean, range, fuzzy, etc…
28
REST API - Search
• Search Types:
• http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A
ND+year:[1981+TO+1984]&search_type=count
• http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A
ND+year:[1981+TO+1984]&search_type=query_then_fetch
• Query and Fetch:
• Executes on all shards and return results
• Query then Fetch:
• Executes on all shards. Only some information returned for rank/sort,
only the relevant shards are asked for data
29
REST API – Query DSL
30
http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+AND+year:[1981+TO+1984]
curl -XPOST 'localhost:9200/_search?pretty' -d '{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : “scarface or star"
}
},
{
"range" : {
“year" : { "gte" : 1981 }
}
}
]
}
}
}'
Becomes…
REST API – Query DSL
• Query String Request use Lucene query syntax
• Limited
• Error-prone
• Instead use “match” query
31
curl -XPOST 'localhost:9200/_search?pretty' -d '{
"query" : {
"bool" : {
"must" : [
{
“match" : {
“message" : “scarface star"
}
},
{
"range" : {
“year" : { "gte" : 1981 }
}
}
]
…
Automatically builds
a boolean query
REST API – Query DSL
• Match Query
• Boolean Query
• Must: document must match query
• Must_not: document must not match query
• Should: document doesn’t have to match
• If it matches… higher score
• Compound queries
32
{
"bool":{
"must":[
{
"match":{
"color":"blue"
}
},
{
"match":{
"title":"shirt"
}
}
],
"must_not":[
{
"match":{
"size":"xxl"
}
}
],
"should":[
{
"match":{
"textile":"cotton"
}
{
“match”:{
“title”:{
“type”:“phrase”,
“query”:“quick fox”,
“slop”:1
}
}
}
REST API – Query DSL
• Range Query
• Numeric / Date Types
• Prefix/Wildcard Query
• Match on partial terms
• Fuzzy Query
• Similar looking text matched.
• RegExp Query
33
{
"range":{
"founded_year":{
"gte":1990,
"lt":2000
}
}
}
REST API – Query DSL
• Geo_bbox
• Bounding box filter
• Geo_distance
• Geo_distance_range
34
{
"query":{
"filtered":{
"query":{
"match_all":{
}
},
"filter":{
"geo_bbox":{
"location":{
"top_left":{
"lat":40.73,
"lon":-74.1
},
"bottom_right":{
"lat":40.717,
"lon":-73.99
}
…
{
"query":{
"filtered":{
"query":{
"match_all":{
}
},
"filter":{
"geo_distance":{
"distance":"400km"
"location":{
"lat":40.73,
"lon":-74.1
}
}
REST API – Bulk Operations
• Bulk API
• Minimize round trips with index/delete ops
• Individual response for every request action
• In order
• Failure of one action will not stop subsequent actions.
• localhost:9200/_bulk
• No pretty-printing. Use n
35
{ "delete" : { "_index" : “imdb", "_type" : “movie", "_id" : "2" } }n
{ "index" : { "_index" : “imdb", "_type" : “actor", "_id" : "1" } }n
{ "first_name" : "Tony", "last_name" : "Soprano" }n
...
{ “update" : { "_index" : “imdb", "_type" : “movie", "_id" : "3" } }n
{ doc : {“title" : “Blade Runner" } }n
Percolate API
• Reversing Search
• Store queries and filter (percolate) documents through them.
36
curl -XPUT localhost:9200/_percolator/stocks/alert-on-nokia -d '{
"query" : {
"boolean" : {
"must" : [
{ "term" : { "company" : "NOK" }},
{ "range" : { "value" : { "lt" : "2.5" }}}
]
}
}
}'
curl -X GET localhost:9200/stocks/stock/_percolate -d '{
"doc" : {
"company" : "NOK",
"value" : 2.4
}
}'
Java Integration
• Client list: http://www.elasticsearch.org/guide/clients/
• Java Client, JEST
• Limited
• https://github.com/searchbox-io/Jest
• Spring Data:
• Uses TransportClient
• Implementation of ElasticsearchRepository aligns with generic
Repository interfaces.
• ElasticSearchCrudRepository extends PagingandSortingRepository
• https://github.com/spring-projects/spring-data-elasticsearch
37
@Document(indexName = "book", type = "book", indexStoreType = "memory", shards = 1, replicas = 0, refreshInterval = "-1")
public class Book {
…
}
public interface ElasticSearchBookRepository extends ElasticsearchRepository<Book, String> {
}
Stuff I didn’t cover…
• Analyzers
• Tokenizers
• Token Filters
• Rivers
• RabbitMQ, MySQL, JDBC
38
$ curl -XGET
'localhost:9200/_analyze?tokenizer=whitespace&filters=lowercase
,stop&pretty=1' -d '
The quick Fox Jumped
'
{
"tokens" : [ {
"token" : "quick",
"start_offset" : 5,
"end_offset" : 10,
"type" : "word",
"position" : 2
}, {
"token" : "fox",
"start_offset" : 11,
"end_offset" : 14,
"type" : "word",
"position" : 3
}, {
"token" : "jumped",
"start_offset" : 15,
"end_offset" : 21,
"type" : "word",
"position" : 4
} ]
}
B’what about Mongo?
• Mongo:
• General purpose DB
• ElasticSearch:
• Distributed text search engine
… that’s all I have to say about that.
39
Questions?
40

Contenu connexe

Tendances

Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with ElasticsearchSamantha Quiñones
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic searchIvan Wallarm
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic searchmarkstory
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearchJoey Wen
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msJodok Batlogg
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Federico Panini
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Karel Minarik
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchMark Greene
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In ElasticsearchKnoldus Inc.
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hoodSmartCat
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Philips Kokoh Prasetyo
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningPetar Djekic
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1Maruf Hassan
 
Elastic Search
Elastic SearchElastic Search
Elastic SearchNavule Rao
 

Tendances (20)

Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic search
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900ms
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearch
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 

En vedette

How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
WordPress 2017 with VueJS and GraphQL
WordPress 2017 with VueJS and GraphQLWordPress 2017 with VueJS and GraphQL
WordPress 2017 with VueJS and GraphQLhouzman
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용종민 김
 
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Airat Khisamov
 
Vue 2.0 + Vuex Router & Vuex at Vue.js
Vue 2.0 + Vuex Router & Vuex at Vue.jsVue 2.0 + Vuex Router & Vuex at Vue.js
Vue 2.0 + Vuex Router & Vuex at Vue.jsTakuya Tejima
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.comRenzo Tomà
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리Junyi Song
 
Cyber Security 101: Training, awareness, strategies for small to medium sized...
Cyber Security 101: Training, awareness, strategies for small to medium sized...Cyber Security 101: Training, awareness, strategies for small to medium sized...
Cyber Security 101: Training, awareness, strategies for small to medium sized...Stephen Cobb
 

En vedette (15)

How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
WordPress 2017 with VueJS and GraphQL
WordPress 2017 with VueJS and GraphQLWordPress 2017 with VueJS and GraphQL
WordPress 2017 with VueJS and GraphQL
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
 
Vue 2.0 + Vuex Router & Vuex at Vue.js
Vue 2.0 + Vuex Router & Vuex at Vue.jsVue 2.0 + Vuex Router & Vuex at Vue.js
Vue 2.0 + Vuex Router & Vuex at Vue.js
 
Logstash
LogstashLogstash
Logstash
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
Elk stack
Elk stackElk stack
Elk stack
 
Cyber Security 101: Training, awareness, strategies for small to medium sized...
Cyber Security 101: Training, awareness, strategies for small to medium sized...Cyber Security 101: Training, awareness, strategies for small to medium sized...
Cyber Security 101: Training, awareness, strategies for small to medium sized...
 
Vue, vue router, vuex
Vue, vue router, vuexVue, vue router, vuex
Vue, vue router, vuex
 

Similaire à Elasticsearch Introduction

Percona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentPercona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentspil-engineering
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3uzzal basak
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017Roy Russo
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018Roy Russo
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-introShaoning Pan
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lampermedcl
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveIBM Cloud Data Services
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedOmid Vahdaty
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nltieleman
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用LINE Corporation
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBAndrew Siemer
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1medcl
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction abenyeung1
 

Similaire à Elasticsearch Introduction (20)

Percona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentPercona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environment
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-intro
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lamper
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Elasticsearch Introduction

  • 2. Agenda • Me • ElasticSearch Basics • Concepts • Network / Discovery • Data Structure • Inverted Index • The REST API • Bulk API • Percolator • Java Integration • Stuff I didn’t cover 2
  • 3. Me 3 • Roy Russo • Former JBoss Portal Co-Founder • LoopFuse Co-Founder • ElasticHQ Founder • http://www.elastichq.org
  • 4. The Basics • Document - Oriented Search Engine • JSON, Lucene • No Schema • Mapping Types • Horizontal Scale, Distributed • REST API • Vibrant Ecosystem • Apps, Plugins, Hosting 4
  • 5. The Basics - Distro • Download and Run 5 ├── bin │ ├── elasticsearch │ ├── elasticsearch.in.sh │ └── plugin ├── config │ ├── elasticsearch.yml │ └── logging.yml ├── data │ └── cluster1 ├── lib │ ├── elasticsearch-x.y.z.jar │ ├── ... │ └── └── logs ├── elasticsearch.log └── elasticsearch_index_search_slowlog.log └── elasticsearch_index_indexing_slowlog.log Executables Log files Node Configs Data Storage
  • 6. The Basics - Glossary • Node = One ElasticSearch instance (1 java proc) • Cluster = 1..N Nodes w/ same Cluster Name • Index = Similar to a DB • Named Collection of Documents • Maps to 1..N Primary shards && 0..N Replica shards • Type = Similar to a DB Table • Document Definition • Shard = One Lucene instance • Distributed across all nodes in the cluster. 6
  • 7. The Basics - Document Structure • Modeled as a JSON object 7 { "genre": "Crime", “language": "English", "country": "USA", "runtime": 170, "title": "Scarface", "year": 1983 } { "_index": "imdb", "_type": "movie", "_id": "u17o8zy9RcKg6SjQZqQ4Ow", "_version": 1, "exists": true, "_source": { "genre": "Crime", "language": "English", "country": "USA", "runtime": 170, "title": "Scarface", "year": 1983 } }
  • 8. The Basics - Document Structure • Document Metadata fields • _id • _type : mapping type • _source : enabled/disabled • _timestamp • _ttl • _size : size of uncompressed _source • _version 8
  • 9. The Basics - Document Structure • Mapping: • ES will auto-map fields • You can specify mapping, if needed • Data Types: • String • Analyzers, Tokenizers, Filters • Number • Int, long, float, double, short, byte • Boolean • Datetime • formatted • geo_point, geo_shape, ip • Attachment (requires plugin) 9
  • 10. Lucene – Inverted Index • Which presidential speeches contain the words “fair” • Go over every speech, word by word, and mark the speeches that contain it • Linear to number of words • Fails at large scale 10
  • 11. Lucene – Inverted Index • Inverting Obama • Take all the speeches • Break them down by word (tokenize) • For each word, store the IDs of the speeches • Sort all words (tokens) • Searching • Finding the word is fast • Iterate over document IDs that are referenced 11 Token Doc Frequency Doc IDs Jobs 2 4,8 Fair 5 1,2,4,8,42 Bush 300 1,2,3,4,5,6, …
  • 12. Lucene – Inverted Index • Not an algorithm • Implementations vary 12
  • 13. Cluster Topology • 4 Node Cluster • Index Configuration: • “A”: 2 Shards, 1 Replica • “B”: 3 Shards, 1 Replica 13 A1 A2 B2 B2 B1 B3 B1 A1 A2 B3
  • 14. The Basics - Shards • Paths… • Primary Shard: • First time Indexing • Index has 1..N primary shards (default: 5) • # Not changeable once index created • Replica Shard: • Copy of the primary shard • Can be changed later • Each primary has 0..N replicas • HA: • Promoted to primary if primary fails • Get/Search handled by primary||replica 14
  • 15. The Basics - Shards • Shard Stages • UNASSIGNED • INITIALIZING • STARTED • RELOCATING • Viewed in Cluster State • Routing table : from indices perspective • Routing nodes 15
  • 16. The Basics - Searching • How it works: • Search request hits a node • Node broadcasts to every shard in the index (primary & replica) • Each shard performs query • Each shard returns results • Results merged, sorted, and returned to client. • Problems: • ES has no idea where your document is • Randomly distributed around cluster • Broadcast query to 100 nodes • Performance degrades 16
  • 17. The Basics - Shards • Shard Allocation Awareness • cluster.routing.allocation.awareness.attributes: rack_id • Example: • 2 Nodes with node.rack_id=rack_one • Create Index 5 shards / 1 replica (10 shards) • Add 2 Nodes with node.rack_id=rack_two • Shards RELOCATE to even distribution • Primary & Replica will NOT be on the same rack_id value. • Shard Allocation Filtering • node.tag=val1 • index.routing.allocation.include.tag:val1,val2 17 curl -XPUT localhost:9200/newIndex/_settings -d '{ "index.routing.allocation.include.tag" : "val1,val2" }'
  • 18. The Basics - Routing 18 curl -XPOST localhost:9200/crunchbase/person/1?routing=xerox -d '{ ... }' curl -XPOST localhost:9200/crunchbase/_search?routing=xerox -d '{ "query" : { "filtered" : { "query" : { ... },} } }' Routing can be used to constrain the shards being searched on, otherwise, all shards will be hit
  • 19. Discovery • Nodes discover each other using multicast. • Unicast is an option • Each cluster has an elected master node • Beware of split-brain • discovery.zen.minimum_master_nodes • N/2+1, where N>2 • N: # of master nodes in cluster 19 discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3"]
  • 20. Nodes • Master node handles cluster-wide (Meta-API) events: • Node participation • New indices create/delete • Re-Allocation of shards • Data Nodes • Indexing / Searching operations • Client Nodes • REST calls • Light-weight load balancers • Beware of Heap Size • ES_HEAP_SIZE: ~1/2 machine memory 20
  • 21. Cluster State • Cluster State • Node Membership • Indices Settings and Mappings (Types) • Shard Allocation Table • Shard State • cURL -XGET http://localhost:9200/_cluster/state?pretty=1' 21
  • 22. Cluster State • Changes in State published from Master to other nodes 22 PUT /newIndex 2 3 1 (M) CS1 CS1 CS1 2 3 1 (M) CS2 CS1 CS1 2 3 1 (M) CS2 CS2 CS2
  • 23. REST API 23 • Building IMDB • Two Indexes
  • 24. REST API • Create Index • action.auto_create_index: 0 • Index Document • Dynamic type mapping • Versioning • ID specification • Parent / Child (/1122?parent=1111) • Explicit Refresh (?refresh=1) • Timeout flag (?timeout=5m) 24
  • 25. REST API – Versioning • Every document is Versioned • Version assigned on creation • Version number can be assigned • Re-Index, Update, and Delete update Version 25
  • 26. REST API - Update • Update using partial data • Partial doc merged with existing • Fails if document doesn’t exist • “Upsert” data used to create a doc, if doesn’t exist 26 { “upsert" : { “title": “Blade Runner” } }
  • 27. REST API • Exists • No overhead in loading • Status Code Result • Delete • Get • Multi-Get 27 { "docs" : [ { "_id" : "1" "_index" : "imdb" "_type" : "movie" }, { "_id" : "5" "_index" : "oldmovies" "_type" : "movie" "_fields" " ["title", "genre"] } ] }
  • 28. REST API - Search • Free Text Search • URL Request • http://localhost:9200/imdb/movie/_search?q=scar* • Complex Query • http://localhost:9200/imdb/movie/_search?q=scarface+OR +star • http://localhost:9200/imdb/movie/_search?q=(scarface+O R+star)+AND+year:[1981+TO+1984] • Term, Boolean, range, fuzzy, etc… 28
  • 29. REST API - Search • Search Types: • http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A ND+year:[1981+TO+1984]&search_type=count • http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A ND+year:[1981+TO+1984]&search_type=query_then_fetch • Query and Fetch: • Executes on all shards and return results • Query then Fetch: • Executes on all shards. Only some information returned for rank/sort, only the relevant shards are asked for data 29
  • 30. REST API – Query DSL 30 http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+AND+year:[1981+TO+1984] curl -XPOST 'localhost:9200/_search?pretty' -d '{ "query" : { "bool" : { "must" : [ { "query_string" : { "query" : “scarface or star" } }, { "range" : { “year" : { "gte" : 1981 } } } ] } } }' Becomes…
  • 31. REST API – Query DSL • Query String Request use Lucene query syntax • Limited • Error-prone • Instead use “match” query 31 curl -XPOST 'localhost:9200/_search?pretty' -d '{ "query" : { "bool" : { "must" : [ { “match" : { “message" : “scarface star" } }, { "range" : { “year" : { "gte" : 1981 } } } ] … Automatically builds a boolean query
  • 32. REST API – Query DSL • Match Query • Boolean Query • Must: document must match query • Must_not: document must not match query • Should: document doesn’t have to match • If it matches… higher score • Compound queries 32 { "bool":{ "must":[ { "match":{ "color":"blue" } }, { "match":{ "title":"shirt" } } ], "must_not":[ { "match":{ "size":"xxl" } } ], "should":[ { "match":{ "textile":"cotton" } { “match”:{ “title”:{ “type”:“phrase”, “query”:“quick fox”, “slop”:1 } } }
  • 33. REST API – Query DSL • Range Query • Numeric / Date Types • Prefix/Wildcard Query • Match on partial terms • Fuzzy Query • Similar looking text matched. • RegExp Query 33 { "range":{ "founded_year":{ "gte":1990, "lt":2000 } } }
  • 34. REST API – Query DSL • Geo_bbox • Bounding box filter • Geo_distance • Geo_distance_range 34 { "query":{ "filtered":{ "query":{ "match_all":{ } }, "filter":{ "geo_bbox":{ "location":{ "top_left":{ "lat":40.73, "lon":-74.1 }, "bottom_right":{ "lat":40.717, "lon":-73.99 } … { "query":{ "filtered":{ "query":{ "match_all":{ } }, "filter":{ "geo_distance":{ "distance":"400km" "location":{ "lat":40.73, "lon":-74.1 } }
  • 35. REST API – Bulk Operations • Bulk API • Minimize round trips with index/delete ops • Individual response for every request action • In order • Failure of one action will not stop subsequent actions. • localhost:9200/_bulk • No pretty-printing. Use n 35 { "delete" : { "_index" : “imdb", "_type" : “movie", "_id" : "2" } }n { "index" : { "_index" : “imdb", "_type" : “actor", "_id" : "1" } }n { "first_name" : "Tony", "last_name" : "Soprano" }n ... { “update" : { "_index" : “imdb", "_type" : “movie", "_id" : "3" } }n { doc : {“title" : “Blade Runner" } }n
  • 36. Percolate API • Reversing Search • Store queries and filter (percolate) documents through them. 36 curl -XPUT localhost:9200/_percolator/stocks/alert-on-nokia -d '{ "query" : { "boolean" : { "must" : [ { "term" : { "company" : "NOK" }}, { "range" : { "value" : { "lt" : "2.5" }}} ] } } }' curl -X GET localhost:9200/stocks/stock/_percolate -d '{ "doc" : { "company" : "NOK", "value" : 2.4 } }'
  • 37. Java Integration • Client list: http://www.elasticsearch.org/guide/clients/ • Java Client, JEST • Limited • https://github.com/searchbox-io/Jest • Spring Data: • Uses TransportClient • Implementation of ElasticsearchRepository aligns with generic Repository interfaces. • ElasticSearchCrudRepository extends PagingandSortingRepository • https://github.com/spring-projects/spring-data-elasticsearch 37 @Document(indexName = "book", type = "book", indexStoreType = "memory", shards = 1, replicas = 0, refreshInterval = "-1") public class Book { … } public interface ElasticSearchBookRepository extends ElasticsearchRepository<Book, String> { }
  • 38. Stuff I didn’t cover… • Analyzers • Tokenizers • Token Filters • Rivers • RabbitMQ, MySQL, JDBC 38 $ curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=lowercase ,stop&pretty=1' -d ' The quick Fox Jumped ' { "tokens" : [ { "token" : "quick", "start_offset" : 5, "end_offset" : 10, "type" : "word", "position" : 2 }, { "token" : "fox", "start_offset" : 11, "end_offset" : 14, "type" : "word", "position" : 3 }, { "token" : "jumped", "start_offset" : 15, "end_offset" : 21, "type" : "word", "position" : 4 } ] }
  • 39. B’what about Mongo? • Mongo: • General purpose DB • ElasticSearch: • Distributed text search engine … that’s all I have to say about that. 39